1.建表,row format delimited fields terminated by '~',行使用"~"分隔,时间类型也是定义成string类型。
create table upchannel_status(id int, upindex int, acquisitiontime string, channelname string) partitioned by (cmtsid int) row format delimited fields terminated by '~'; |
create EXTERNAL table IF NOT EXISTS stbprimary(stbid string, bbusinessid string, bsubbusinessid string, btime string, bcontent string, ebusinessid string, esubbusinessid string, etime string, econtent string) row format delimited fields terminated by ','; |
EXTERNAL 关键字可以让用户创建一个外部表,在建表的同时指定一个指向实际数据的路径(LOCATION),Hive 创建内部表时,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不对数据的位置做任何改变。在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。
2.加载数据到表
加载本地数据
load data local inpath '/home/hadoop/data/upchannel_status.txt' overwrite into table upchannel_status; |
加载hdfs数据
load data inpath '/data/upchannel_status.txt' overwrite into table upchannel_status; |
overwrite会覆盖表中的数据,如果只是追加数据,则去掉overwrite关键字。
3.导出数据到本地,使用分隔符"~"
insert overwrite local directory '/home/hadoop/data/upchannel_status_order' row format delimited fields terminated by '~' select* from upchannel_status_order; |
4.创建hbase中不存在的表
create external table hbase_table(key int, fam1_vala string, fam1_valb string, fam2_vala string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping" = ":key,fam1:vala,fam1:valb,fam2:valb") tblproperties("hbase.table.name" = "xyz"); |
在hbase中创建表xyz,hbase.table.name定义在hbase的table名称,hbase.columns.mapping定义在hbase的列族。
:key对应key,是hbase中的rowkey,在hive中相当于主键。fam1:vala对应fam1_valb,fam1:valb对应fam1_valb。都是hbase中同一个列簇fam1。
5.关联hbase中存在的表,在hive中做一个映射
create external table hbase_table(key int, fam1_vala string, fam1_valb string, fam2_vala string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping" = ":key,fam1:vala,fam1:valb,fam2:valb") tblproperties("hbase.table.name" = "xyz"); |
加上关键字external
6.往表中插入数据,hive中不能insert单独的行,只能批量插入,hbase_table必须已经存在。overwrite是覆盖表中的数据,into是追加表中的数据
insert overwrite table hbase_table select * from pokes; insert into table hbase_table select * from pokes; insert overwrite table upchannel_status_partition partition(cmtsid) select key,upindex,actime,regcms,totalregcms,cmtsid from upchannel_status; #往分区表插入数据 |
7.创建表并且往表中插入数据。表upchannel_status_one不存在。注意:这样创建的表在字段之间分隔没有符号,如果最后要用sqoop导出到mysql,需要指定字段之间分隔的符号,所以最好先建好表,指定好分隔符,然后再往里面overwrite数据。
create table upchannel_status_one as select * from ( select *,row_number() over (partition by cmtsid,upindex,substr(actime,1,13) order by actime desc) as rownum from upchannel_status) t where t.rownum <=1; |