文档章节

sqoop的安装与使用,sqoop安装使用

Zero零_度
 Zero零_度
发布于 2016/06/06 21:09
字数 4211
阅读 515
收藏 1
点赞 0
评论 0

sqoop的安装与使用,sqoop安装使用

 

1.什么是Sqoop

Sqoop即 SQL to Hadoop ,是一款方便的在传统型数据库与Hadoop之间进行数据迁移的工具,充分利用MapReduce并行特点以批处理的方式加快数据传输,发展至今主要演化了二大版本,Sqoop1和Sqoop2。 

Sqoop工具是hadoop下连接关系型数据库和Hadoop的桥梁,支持关系型数据库和hive、hdfs,hbase之间数据的相互导入,可以使用全表导入和增量导入。

那么为什么选择Sqoop呢? 

高效可控的利用资源,任务并行度,超时时间。 数据类型映射与转化,可自动进行,用户也可自定义 支持多种主流数据库,MySQL,Oracle,SQL Server,DB2等等 

2.Sqoop1和Sqoop2对比的异同之处

两个不同的版本,完全不兼容 版本号划分区别,Apache版本:1.4.x(Sqoop1); 1.99.x(Sqoop2)     CDH版本 : Sqoop-1.4.3-cdh4(Sqoop1) ; Sqoop2-1.99.2-cdh4.5.0 (Sqoop2)Sqoop2比Sqoop1的改进 引入Sqoop server,集中化管理connector等 多种访问方式:CLI,Web UI,REST API 引入基于角色的安全机制 

3.Sqoop1与Sqoop2的架构图

Sqoop架构图1

\

Sqoop架构图2

\

4.Sqoop1与Sqoop2的优缺点

 

比较

Sqoop1

Sqoop2

架构

仅仅使用一个Sqoop客户端

引入了Sqoop server集中化管理connector,以及rest api,web,UI,并引入权限安全机制

部署

部署简单,安装需要root权限,connector必须符合JDBC模型

架构稍复杂,配置部署更繁琐

使用    

命令行方式容易出错,格式紧耦合,无法支持所有数据类型,安全机制不够完善,例如密码暴漏

多种交互方式,命令行,web UI,rest API,conncetor集中化管理,所有的链接安装在Sqoop server上,完善权限管理机制,connector规范化,仅仅负责数据的读写

 

 

 

5.Sqoop1的安装部署

5.0 安装环境

hadoop:hadoop-2.3.0-cdh5.1.2

sqoop:sqoop-1.4.4-cdh5.1.2

5.1 下载安装包及解压

tar -zxvf  sqoop-1.4.4-cdh5.1.2.tar.gz
 

ln -s sqoop-1.4.4-cdh5.1.2  sqoop

5.2 配置环境变量和配置文件

<span style="font-size:18px;">cd sqoop/conf/

cat  sqoop-env-template.sh  >> sqoop-env.sh

vi sqoop-env.sh </span>


 

在sqoop-env.sh中添加如下代码

<span style="font-size:18px;"># Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*

# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop

#set the path to where bin/hbase is available
export HBASE_HOME=/home/hadoop/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/zookeeper
</span>


该配置文件中只有HADOOP_COMMON_HOME的配置是必须的 另外关于hbase和hive的配置 如果用到需要配置 不用的话就不用配置

 

 

5.3 添加需要的jar包到lib下面

这里的jar包指的是连接关系型数据库的jar 比如mysql oracle  这些jar包是需要自己添加到lib目录下面去的

 

<span style="font-size:18px;"> cp  ~/hive/lib/mysql-connector-java-5.1.30.jar   ~/sqoop/lib/</span>

 

5.4 添加环境变量

vi ~/.profile

添加如下内容

<span style="font-size:18px;">export SQOOP_HOME=/home/hadoop/sqoop

export SBT_HOME=/home/hadoop/sbt


export PATH=$PATH:$SBT_HOME/bin:$SQOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
</span>

source ~/.profile使配置文件生效

 

 

5.5 测试mysql数据库的连接使用

①连接mysql数据库,列出所有的数据库

 

<span style="font-size:18px;">hadoop@caozw:~/sqoop/conf$ sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P
Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/10/21 18:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2
Enter password: 
14/10/21 18:15:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
XINGXUNTONG
XINGXUNTONG_HIVE
amon
hive
hmon
mahout
mysql
oozie
performance_schema
realworld
rman
scm
smon
</span>


-P表示输入密码 可以直接使用--password来制定密码

 

 

②mysql数据库的表导入到HDFS

 

hadoop@caozw:~/sqoop/conf$ sqoop import -m 1  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test1
Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/10/21 18:19:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2
Enter password: 
14/10/21 18:19:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/10/21 18:19:21 INFO tool.CodeGenTool: Beginning code generation
14/10/21 18:19:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1
14/10/21 18:19:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1
14/10/21 18:19:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop
Note: /tmp/sqoop-hadoop/compile/15cb67e2b315154cdf02e3a17cf32bbe/weblogs.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/10/21 18:19:23 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/15cb67e2b315154cdf02e3a17cf32bbe/weblogs.jar
14/10/21 18:19:23 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/10/21 18:19:23 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/10/21 18:19:23 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/10/21 18:19:23 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/10/21 18:19:23 INFO mapreduce.ImportJobBase: Beginning import of weblogs
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/10/21 18:19:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/10/21 18:19:24 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/10/21 18:19:25 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/10/21 18:19:25 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/10/21 18:19:40 INFO db.DBInputFormat: Using read commited transaction isolation
14/10/21 18:19:41 INFO mapreduce.JobSubmitter: number of splits:1
14/10/21 18:19:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413879907572_0002
14/10/21 18:19:46 INFO impl.YarnClientImpl: Submitted application application_1413879907572_0002
14/10/21 18:19:46 INFO mapreduce.Job: The url to track the job: N/A
14/10/21 18:19:46 INFO mapreduce.Job: Running job: job_1413879907572_0002
14/10/21 18:20:12 INFO mapreduce.Job: Job job_1413879907572_0002 running in uber mode : false
14/10/21 18:20:12 INFO mapreduce.Job:  map 0% reduce 0%
14/10/21 18:20:41 INFO mapreduce.Job:  map 100% reduce 0%
14/10/21 18:20:45 INFO mapreduce.Job: Job job_1413879907572_0002 completed successfully
14/10/21 18:20:46 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=107189
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=251130
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=22668
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=22668
		Total vcore-seconds taken by all map tasks=22668
		Total megabyte-seconds taken by all map tasks=23212032
	Map-Reduce Framework
		Map input records=3000
		Map output records=3000
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=41
		CPU time spent (ms)=1540
		Physical memory (bytes) snapshot=133345280
		Virtual memory (bytes) snapshot=1201442816
		Total committed heap usage (bytes)=76021760
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=251130
14/10/21 18:20:46 INFO mapreduce.ImportJobBase: Transferred 245.2441 KB in 80.7974 seconds (3.0353 KB/sec)
14/10/21 18:20:46 INFO mapreduce.ImportJobBase: Retrieved 3000 records.


-m 表示启动几个map任务来读取数据   如果数据库中的表没有主键这个参数是必须设置的而且只能设定为1   否则会提示

 

 

14/10/21 18:18:27 ERROR tool.ImportTool: Error during import: No primary key could be found for table weblogs. Please specify one with --split-by or perform a sequential import with '-m 1'.

而这个参数设置为几会直接决定导入的文件在hdfs上面是分成几块的 比如 设置为1 则会产生一个数据文件

 

 

14/10/21 18:23:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2014-10-21 18:20 /user/sqoop/test1/_SUCCESS
-rw-r--r--   1 hadoop supergroup     251130 2014-10-21 18:20 /user/sqoop/test1/part-m-00000


这里添加主键:

 

 

mysql> desc weblogs;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| md5          | varchar(32) | YES  |     | NULL    |       |
| url          | varchar(64) | YES  |     | NULL    |       |
| request_date | date        | YES  |     | NULL    |       |
| request_time | time        | YES  |     | NULL    |       |
| ip           | varchar(15) | YES  |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)


mysql> alter table weblogs add primary key(md5,ip);
Query OK, 3000 rows affected (1.60 sec)
Records: 3000  Duplicates: 0  Warnings: 0

mysql> desc weblogs;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| md5          | varchar(32) | NO   | PRI |         |       |
| url          | varchar(64) | YES  |     | NULL    |       |
| request_date | date        | YES  |     | NULL    |       |
| request_time | time        | YES  |     | NULL    |       |
| ip           | varchar(15) | NO   | PRI |         |       |
+--------------+-------------+------+-----+---------+-------+
5 rows in set (0.02 sec)

然后指定-m

 

 

hadoop@caozw:~/sqoop/conf$ sqoop import -m 2  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test2
Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/10/21 18:22:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2
Enter password: 
14/10/21 18:24:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/10/21 18:24:04 INFO tool.CodeGenTool: Beginning code generation
14/10/21 18:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1
14/10/21 18:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1
14/10/21 18:24:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop
Note: /tmp/sqoop-hadoop/compile/7061f445f29510afa2b89729126a57b9/weblogs.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/10/21 18:24:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7061f445f29510afa2b89729126a57b9/weblogs.jar
14/10/21 18:24:07 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/10/21 18:24:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/10/21 18:24:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/10/21 18:24:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/10/21 18:24:07 ERROR tool.ImportTool: Error during import: No primary key could be found for table weblogs. Please specify one with --split-by or perform a sequential import with '-m 1'.
hadoop@caozw:~/sqoop/conf$ sqoop import -m 2  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test2
Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/10/21 18:30:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2
Enter password: 
14/10/21 18:30:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/10/21 18:30:07 INFO tool.CodeGenTool: Beginning code generation
14/10/21 18:30:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1
14/10/21 18:30:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1
14/10/21 18:30:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop
Note: /tmp/sqoop-hadoop/compile/6dbf2401c1a51b81c5b885e6f7d43137/weblogs.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/10/21 18:30:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/6dbf2401c1a51b81c5b885e6f7d43137/weblogs.jar
14/10/21 18:30:09 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/10/21 18:30:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/10/21 18:30:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/10/21 18:30:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/10/21 18:30:09 WARN manager.CatalogQueryManager: The table weblogs contains a multi-column primary key. Sqoop will default to the column md5 only for this job.
14/10/21 18:30:09 WARN manager.CatalogQueryManager: The table weblogs contains a multi-column primary key. Sqoop will default to the column md5 only for this job.
14/10/21 18:30:09 INFO mapreduce.ImportJobBase: Beginning import of weblogs
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/10/21 18:30:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/10/21 18:30:09 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/10/21 18:30:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/10/21 18:30:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/10/21 18:30:17 INFO db.DBInputFormat: Using read commited transaction isolation
14/10/21 18:30:17 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`md5`), MAX(`md5`) FROM `weblogs`
14/10/21 18:30:17 WARN db.TextSplitter: Generating splits for a textual index column.
14/10/21 18:30:17 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
14/10/21 18:30:17 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
14/10/21 18:30:18 INFO mapreduce.JobSubmitter: number of splits:4
14/10/21 18:30:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413879907572_0003
14/10/21 18:30:19 INFO impl.YarnClientImpl: Submitted application application_1413879907572_0003
14/10/21 18:30:19 INFO mapreduce.Job: The url to track the job: N/A
14/10/21 18:30:19 INFO mapreduce.Job: Running job: job_1413879907572_0003
14/10/21 18:30:32 INFO mapreduce.Job: Job job_1413879907572_0003 running in uber mode : false
14/10/21 18:30:32 INFO mapreduce.Job:  map 0% reduce 0%
14/10/21 18:31:12 INFO mapreduce.Job:  map 50% reduce 0%
14/10/21 18:31:13 INFO mapreduce.Job:  map 75% reduce 0%
14/10/21 18:31:15 INFO mapreduce.Job:  map 100% reduce 0%
14/10/21 18:31:21 INFO mapreduce.Job: Job job_1413879907572_0003 completed successfully
14/10/21 18:31:22 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=429312
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=532
		HDFS: Number of bytes written=251209
		HDFS: Number of read operations=16
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=8
	Job Counters 
		Launched map tasks=4
		Other local map tasks=4
		Total time spent by all maps in occupied slots (ms)=160326
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=160326
		Total vcore-seconds taken by all map tasks=160326
		Total megabyte-seconds taken by all map tasks=164173824
	Map-Reduce Framework
		Map input records=3001
		Map output records=3001
		Input split bytes=532
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=806
		CPU time spent (ms)=5450
		Physical memory (bytes) snapshot=494583808
		Virtual memory (bytes) snapshot=4805771264
		Total committed heap usage (bytes)=325058560
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=251209
14/10/21 18:31:22 INFO mapreduce.ImportJobBase: Transferred 245.3213 KB in 72.5455 seconds (3.3816 KB/sec)

这里产生的文件跟主键的字段个数以及-m的参数是相关的 大致是-m的值乘以主键字段数,有待考证

 

 

hadoop@caozw:~/study/cdh5$ hadoop fs -ls /user/sqoop/test2/
14/10/21 18:32:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 5 items
-rw-r--r--   1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/_SUCCESS
-rw-r--r--   1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/part-m-00000
-rw-r--r--   1 hadoop supergroup     251130 2014-10-21 18:31 /user/sqoop/test2/part-m-00001
-rw-r--r--   1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/part-m-00002
-rw-r--r--   1 hadoop supergroup         79 2014-10-21 18:31 /user/sqoop/test2/part-m-00003


这里的主键设计的不合理导致数据分布不均匀~~  有待改进

 

 

③数据导出Oracle和HBase

使用export可将hdfs中数据导入到远程数据库中

           export --connect jdbc:oracle:thin:@192.168.**.**:**:**--username **--password=** -m1table VEHICLE--export-dir /user/root/VEHICLE

向Hbase导入数据

           sqoop import --connect jdbc:oracle:thin:@192.168.**.**:**:**--username**--password=**--m 1 --table VEHICLE --hbase-create-table --hbase-table VEHICLE--hbase-row-key ID --column-family VEHICLEINFO --split-by ID

5.6 测试Mysql数据库的使用

前提:导入mysql jdbc的jar包

①测试数据库连接
sqoop list-databases –connect jdbc:mysql://192.168.10.63 –username root–password 123456
②Sqoop的使用
以下所有的命令每行之后都存在一个空格,不要忘记
(以下6中命令都没有进行过成功测试)

<1>mysql–>hdfs
sqoop export –connect
jdbc:mysql://192.168.10.63/ipj
–username root
–password 123456
–table ipj_flow_user
–export-dir hdfs://192.168.10.63:8020/user/flow/part-m-00000
前提:
(1)hdfs中目录/user/flow/part-m-00000必须存在
(2)如果集群设置了压缩方式lzo,那么本机必须得安装且配置成功lzo
(3)hadoop集群中每个节点都要有对mysql的操作权限

<2>hdfs–>mysql
sqoop import –connect
jdbc:mysql://192.168.10.63/ipj
–table ipj_flow_user

<3>mysql–>hbase
sqoop  import  –connect
jdbc:mysql://192.168.10.63/ipj
–table ipj_flow_user
–hbase-table ipj_statics_test
–hbase-create-table
–hbase-row-key id
–column-family imei

<4>hbase–>mysql
关于将Hbase的数据导入到mysql里,Sqoop并不是直接支持的,一般采用如下3种方法:
第一种:将Hbase数据扁平化成HDFS文件,然后再由Sqoop导入.
第二种:将Hbase数据导入Hive表中,然后再导入mysql。
第三种:直接使用Hbase的Java API读取表数据,直接向mysql导入
不需要使用Sqoop。

<5>mysql–>hive
sqoop import –connect
jdbc:mysql://192.168.10.63/ipj
–table hive_table_test
–hive-import 
–hive-table hive_test_table 或–create-hive-table hive_test_table

<6>hive–>mysql
sqoop export –connect
jdbc:mysql://192.168.10.63/ipj
–username hive 
–password 123456 
–table target_table 
–export-dir /user/hive/warehouse/uv/dt=mytable
前提:mysql中表必须存在


③Sqoop其他操作
<1>列出mysql中的所有数据库
sqoop list-databases –connect jdbc:mysql://192.168.10.63:3306/ –usernameroot –password 123456 
<2>列出mysql中某个库下所有表
sqoop list-tables –connect jdbc:mysql://192.168.10.63:3306/ipj –usernameroot –password 123456

6 Sqoop1的性能

 测试数据:

表名:tb_keywords
行数:11628209
数据文件大小:1.4G
测试结果:

 

HDFS--->DB

HDFS<---DB

Sqoop

428s

166s

HDFS<->FILE<->DB

209s

105s


从结果上来看,以FILE作为中转方式性能是要高于SQOOP的,原因如下:

本质上SQOOP使用的是JDBC,效率不会比MYSQL自带的导入\导出工具效率高以导入数据到DB为例,SQOOP的设计思想是分阶段提交,也就是说假设一个表有1K行,那么它会先读出100行(默认值),然后插入,提交,再读取100行……如此往复

即便如此,SQOOP也是有优势的,比如说使用的便利性,任务执行的容错性等。在一些测试环境中如果需要的话可以考虑把它拿来作为一个工具使用。

本文转载自:http://www.bkjia.com/yjs/897164.html

共有 人打赏支持
Zero零_度
粉丝 67
博文 1245
码字总数 252866
作品 0
程序员
大数据实验室(大数据基础培训)——Sqoop的安装、配置及基础使用

为某企业做的培训,完整文档见:http://gudaoxuri.github.io/bd-lab/ 12. Sqoop 官方文档:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html 12.1. 安装 tar -zxf sqoop-1.4.6.binha......

孤岛旭日
2016/05/06
38
0
Sqoop1和Sqoop2简介

主要来源: http://www.linuxidc.com/Linux/2014-10/108337.htm 1.什么是Sqoop Sqoop即 SQL to Hadoop ,是一款方便的在传统型数据库与Hadoop之间进行数据迁移的工具,充分利用MapReduce并行特...

强子哥哥
2015/12/23
792
0
Sqoop2 vs Sqoop

以下文字内容部分译自Sqoop2 vs Sqoop: Sqoop1是一个客户端程序,用户只需在客户端添加驱动或者连接器到Sqoop中即可使用, Sqoop2是一个基于服务的模型,新一代Sqoop,所有配置以及驱动都必...

cwalet
2015/11/23
912
0
Sqoop数据分析引擎安装与使用

Sqoop数据分析引擎安装与使用 ==>什么是Sqoop ? Sqoop 是一个开源的数据处理引擎,主要是通过 JDBC 为媒介, 在Hadoop(Hive)与 传统的关系型数据库(Oracle, MySQL,Postgres等)间进行数据的...

菜鸟的征程
01/12
0
0
Installing Sqoop2 On Oracle Linux 7.4

Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系...

candon123
2017/12/20
0
0
Sqoop 精度问题修复

线上使用sqoop抽数时出现精度丢失问题,经查Sqoop默认导数时将decimal类型转为double类型导致 使用版本为sqoop-1.4.4,hive 版本为1.2.1 经查看源码发现sqoop/src/java/org/apache/sqoop/hiv...

bluz
2016/11/15
0
0
基于Hadoop生态圈的数据仓库实践 —— ETL(一)

一、使用Sqoop抽取数据 1. Sqoop简介 Sqoop是一个在Hadoop与结构化数据存储(如关系数据库)之间高效传输大批量数据的工具。它在2012年3月被成功孵化,现在已是Apache的顶级项目。Sqoop有Sqo...

wzy0623
2016/07/01
0
0
Sqoop1配置过程

注:sqoop官网提供sqoop1和sqoop2两个版本,sqoop2不支持hive、hbase等。 配置过程 1、解压sqoop tar包,本例解压到/usr/local/sqoop-1.4.6目录。 2、复制/usr/local/sqoop-1.4.6/conf下的s...

源子
2016/10/13
3
0
sqoop 完成与关系型数据库的互导

一.安装SQOOP后可使用如下命令列出mysql数据库中的所有数据库,与检验是否安装成功。 # sqoop list-databases --connect jdbc:mysql://localhost:3306/ --username root --password 123456 ...

gulf
06/26
0
0
使用Sqoop实现HDFS与Mysql互转

简介 Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导入到Hadoop的HDFS中,也可以将HDFS的数据导入...

疯狂的艺术家
2013/05/10
0
1

没有更多内容

加载失败,请刷新页面

加载更多

下一页

崛起于Springboot2.X之开发拦截器(21)

序言:几乎所有项目都需要拦截器,所以小伙伴们必须要掌握这门技术哦,不然只会mybaits增删改查那是实习生干的活呀。 1、创建拦截器类,implements HandlerInterceptor public class MyInce...

木九天
15分钟前
1
0
(转)SQL语句的执行顺序

(7) SELECT (8) DISTINCT <select_list> (1) FROM <left_table> (3) <join_type> JOIN <right_table> (2) ON <join_condition> (4) WHERE <where_condition> (5) GROUP BY <group_by_list> (......

Avner
24分钟前
0
0
1.14 救援模式

确保开机启动时连接镜像文件,如果是真机服务器,就需要:U盘或光盘镜像启动进入BIOS 不同主板进入bios按键不同,一般是F12或Esc 光标:移动到Boot(开机启动项) 减号移动:光标选中行,按-...

小丑鱼00
31分钟前
0
0
ES11-全文检索

高级别全文检索通常用于在全文本字段(如电子邮件正文)上运行全文检索。 他们了解如何分析被查询的字段,并在执行之前将每个字段的分析器(或search_analyzer)应用于查询字符串。 1.term查...

贾峰uk
35分钟前
0
0
java 复制对象有哪些方式

java 复制对象有哪些方式 Apache的 Common beanutils库 org.apache.commons.beanutils.BeanUtils.copyProperties(dest,origin); Springframework 的BeanUtil 依赖: <dependency> ......

黄威
50分钟前
2
0
jstack的简单使用

公司测试反应, 一个java应用的机器, 即使不做交易, cpu始终是30%多, 于是想到了jstack, 实践步骤记录一下: 1, 找出java应用的进程号 ps -ef|grep 应用名|grep -v grep 2, 找出pid下的cpu占用...

零二一七
57分钟前
1
0
导入CSV文件就行数据整理分析

#-*-coding:utf-8-*-import csv,os,re,mathlocalPath=input("请输入所有群文件的根目录:") #所有QQ群文件的物理根目录路径def info(): info_dic=[] dirList=os.listdi...

Kefy
今天
5
0
CoreText进阶(六)-内容大小计算和自动布局

CoreText进阶(六)-内容大小计算和自动布局 其它文章: CoreText 入门(一)-文本绘制 CoreText入门(二)-绘制图片 CoreText进阶(三)-事件处理 CoreText进阶(四)-文字行数限制和显示更...

aron1992
今天
1
0
一个Unity高人的博客,涉猎范围很广,深度也很深。

https://blog.csdn.net/ecidevilin/article/list/

爽歪歪ES
今天
0
0
Spring Cloud Config-Git后端

EnvironmentRepository的默认实现使用Git后端,这对于管理升级和物理环境以及审核更改非常方便。要更改存储库的位置,可以在Config Server中设置“spring.cloud.config.server.git.uri”配置...

itcloud
今天
1
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部