ETL定义
ETL,是英文 Extract-Transform-Load 的缩写,用来描述将数据从来源端经过抽取(extract)、转换(transform)、加载(load)至目的端的过程。ETL一词较常用在数据仓库,但其对象并不限于数据仓库。
ETL使用
在笔者的工作环境中,存在N多个测试环境,因为整套环境有50多套,每次版本都统一发布。然后笔者每次测试都需要建立对应的测试数据,业务复杂啊。所以笔者随机找了一个可以5分钟使用ETL工具scriptella。
scriptella使用介绍
例子项目地址scriptella例子
- 引入maven依赖
<properties>
<scriptella.version>1.1</scriptella.version>
</properties>
<dependencies>
<dependency>
<groupId>com.javaforge.scriptella</groupId>
<artifactId>scriptella-core</artifactId>
<version>${scriptella.version}</version>
</dependency>
<dependency>
<groupId>com.javaforge.scriptella</groupId>
<artifactId>scriptella-drivers</artifactId>
<version>${scriptella.version}</version>
</dependency>
<dependency>
<groupId>com.javaforge.scriptella</groupId>
<artifactId>scriptella-tools</artifactId>
<version>${scriptella.version}</version>
</dependency>
<dependency>
<!-- because oracle is not in maven,you must use local yourself -->
<groupId>com.oracle</groupId>
<artifactId>ojdbc6</artifactId>
<version>11.2.0.1.0</version>
</dependency>
</dependencies>
- 编写etl.xml脚本(官网例子缺少driver属性)
<etl>
<connection id="db1" url="jdbc:oracle:1" user="1" password="1" driver="oracle.jdbc.driver.OracleDriver"/>
<connection id="db2" url="jdbc:oracle:2" user="2" password="2" driver="oracle.jdbc.driver.OracleDriver"/>
<query connection-id="db1">
<!-- Select product from software category in db1-->
SELECT * FROM Product WHERE category='software';
<!-- for each row execute a script -->
<script connection-id="db2">
<!-- Insert all selected products to db2
use ? to reference properties, columns or ?{expressions}-->
INSERT INTO Product(id, category, product_name) values (?id, ?{category}, ?name);
</script>
</query>
</query>
</etl>
- 编写java类
EtlExecutor.newExecutor(new File("etl.xml")).execute(); //Execute etl.xml file