文档章节

6-Hadoop之旅-Hive(二)

crayzer_yixiu
 crayzer_yixiu
发布于 2016/09/16 10:23
字数 929
阅读 32
收藏 1

Hive的数据类型

–: primitive_type
–| array_type
–| map_type
–| struct_type
–:primitive_type
–|TINYINT
–| SMALLINT
–| INT
–| BIGINT
–| BOOLEAN
–| FLOAT
–| DOUBLE
–| STRING

Hive完整的DDL

–CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
–[(col_namedata_type[COMMENT col_comment], ...)]
–[COMMENT table_comment]
–[PARTITIONED BY (col_namedata_type[COMMENT col_comment], ...)]
–[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name[ASC|DESC], ...)] INTO num_bucketsBUCKETS]
–[SKEWED BY (col_name, col_name, ...) ON ([(col_value, col_value, ...), ...|col_value, col_value, ...]) (Note: only available starting with 0.10.0)]
–[ [ROW FORMAT row_format] [STORED AS file_format]
–| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] (Note: only available starting with 0.6.0) ]
–[LOCATION hdfs_path]
–[TBLPROPERTIES (property_name=property_value, ...)] (Note: only available starting with 0.6.0)
–[AS select_statement] (Note: this feature is only available starting with 0.5.0, and is not supported when creating external tables.)

###Hive常用的DDL
–CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
–LIKE existing_table_or_view_name
–[LOCATION hdfs_path]

###Hive建表
–create table person(
–id int,
–name string,
–age int,
–likes array<string>,
–address map<string,string>
–)
–row format delimited
–FIELDS TERMINATED BY ','
–COLLECTION ITEMS TERMINATED BY '-'
–MAP KEYS TERMINATED BY ':'
–lines terminated by '\n';
–Select address[‘city’] from person where name=‘zs’

###创建分区表
-partition分区放在ROW FORMAT 之前
-分区地段不能出现之前定义的字段之内
–Load分区数据

###Hive常用的DDL删除表
–DROP TABLE table_name

Hive导入数据

Load data [local] inpath 'student.txt' overwrite into table student.
- 添加local 说明在数据文件在搭建集群的文件系统上
- 没有local 说明在集群的hdfs上

Hive的内表和外表
External关键字
内表删除表或者分区元数据和数据都删了
外表删除表元数据删除,数据保留

HIve的分区partition

  • 必须在表定义时创建partition
    • 单分区建表语句:create table day_table(id int, content string) partitioned by (dt string)
      • 单分区表,按天分区,在表结构中存在id,content,dt三列。
      • 以dt为文件夹区分
    • 双分区建表语句:create table day_hour_table(id int, content string) partitioned by (dt string, hour string)
      • 双分区表,按天和小时分区,在表结构中新增加了dt和hour两列。
      • 先以dt为文件夹,再以hour子文件夹区分
  • Hive添加分区表语法
    • (表已创建,在此基础上添加分区):ALTER TABLE table_name ADD partition_spec[ LOCATION 'location1' ]partition_spec[ LOCATION 'location2' ] ...
    • ALTER TABLE day_table ADD PARTITION (dt='2008-08-08', hour='08') location '/path/pv1.txt'
  • Hive删除分区语法
    • ALTER TABLE table_name  DROP  partition_spec, partition_spec,...
    • 用户可以用ALTER TABLE DROP PARTITION 来删除分区。分区的元数据和数据将被一并删除。例:
      • ALTER TABLE day_hour_table  DROP  PARTITION (dt='2008-08-08', hour='09');
  • Hive数据加载进分区表中语法
    • LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
    • 例如:LOAD DATA INPATH '/user/pv.txt' INTO TABLE day_hour_table PARTITION(dt='2008-08-08', hour='08'); LOAD DATA local INPATH '/user/hua/*' INTO TABLE day_hourpartition(dt='2010-07-07');当数据被加载至表中时,不会对数据进行任何转换。Load操作只是将数据复制至Hive表对应的位置。数据加载时在表下自动创建一个目录基于分区的查询的语句:SELECT day_table.* FROM day_tableWHERE day_table.dt>= '2008-08-08';
  • Hive查看分区语句
    • hive> show partitions day_hour_table; OK dt=2008-08-08/hour=08 dt=2008-08-08/hour=09 dt=2008-08-09/hour=09

HQL

create database test;
show databases;
use databases;

###--psn
#1,xiaoming1,man,book-shejian-shoot,beijing:wudaokou-huoxing:weizhi-shanghai:songjiang
#2,xiaoming2,man,shejian-shoot,beijing:wudaokou-huoxing:weizhi
#3,xiaoming3,man,book,beijing:wudaokou-huoxing:weizhi
#4,xiaoming4,man,book-shoot,beijing:wudaokou-huoxing:weizhi
###

#创建表psn
create table psn(id int, name string, sex string, likes ARRAY<string>, address map<string,string>)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '-' 
MAP KEYS TERMINATED BY ':';
#加载数据
load data local inpath '/tmp/psn' overwrite into table psn;
#创建外表
create EXTERNAL table psn2 
(id int, name string, sex string, likes ARRAY<string>, address map<string,string>)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '-' 
MAP KEYS TERMINATED BY ':'
location '/test/psn2';

create table psn3 as select * from psn;

create table psn4 like psn;

create EXTERNAL table psn5 
(id int, name string, sex string, likes ARRAY<string>, address map<string,string>)
PARTITIONED BY (age int, weight int) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '-' 
MAP KEYS TERMINATED BY ':' ;

create table psn7 
(id int, name string, sex string, likes ARRAY<string>, address map<string,string>)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '-' 
MAP KEYS TERMINATED BY ':';

from psn6 p 
insert into table psn7
  select p.id, p.name, p.sex, p.likes, p.address;


from psn6 p 
insert OVERWRITE table psn7
  select p.id, p.name, p.sex, p.likes, p.address;

 

© 著作权归作者所有

crayzer_yixiu
粉丝 26
博文 57
码字总数 87921
作品 0
杭州
高级程序员
私信 提问
使用sqoop从mysql导入数据到hive

stop-yarn.shstart-dfs.shstart-yarn.sh

wscrf
2018/05/28
0
0
搭建Hadoop2.7.3+Hive2.1.1及MySQL(配置Hadoop)(一)

一、准备工作: 操作系统:Linux(CentOS 7.0) 下载 Java(jdk-8u111-linux-x64.rpm) Hive2.1.1(apache-hive-2.1.1-bin.tar.gz jdk-8u111-linux-x64.rpm) Hadoop2.7.3(hadoop-2.7.3.tar.gz) 下......

roy_88
2017/02/09
0
0
Hive Remote模式搭建

一、实验环境 1.软件版本:apache-hive-2.3.0-bin.tar.gz、mysql-community-server-5.7.19 2.mysql JDBC驱动包:mysql-connector-java-5.1.44.tar.gz 3.mysql已经安装在hadoop5上 4..主机规划......

一语成谶灬
2017/09/21
0
0
Hive系列(二)Hive环境安装

由于之前的Hadoop集群是用Docker搭建的,所以后面整个大数据环境,本人都会采用Docker构建,Hive环境安装需在之前的Hadoop基础上构建,参考文章如下: Hadoop系列(一)Docker部署Hadoop集群 ...

u012834750
2018/05/29
0
0
5-Hadoop之旅-Hive(一)

Hive能做什么? Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。其优点是学习成本低...

SET
2016/09/12
93
0

没有更多内容

加载失败,请刷新页面

加载更多

Mybatis Plus删除

/** @author beth @data 2019-10-17 00:30 */ @RunWith(SpringRunner.class) @SpringBootTest public class DeleteTest { @Autowired private UserInfoMapper userInfoMapper; /** 根据id删除......

一个yuanbeth
今天
4
0
总结

一、设计模式 简单工厂:一个简单而且比较杂的工厂,可以创建任何对象给你 复杂工厂:先创建一种基础类型的工厂接口,然后各自集成实现这个接口,但是每个工厂都是这个基础类的扩展分类,spr...

BobwithB
今天
4
0
java内存模型

前言 Java作为一种面向对象的,跨平台语言,其对象、内存等一直是比较难的知识点。而且很多概念的名称看起来又那么相似,很多人会傻傻分不清楚。比如本文我们要讨论的JVM内存结构、Java内存模...

ls_cherish
今天
4
0
友元函数强制转换

友元函数强制转换 p522

天王盖地虎626
昨天
5
0
js中实现页面跳转(返回前一页、后一页)

本文转载于:专业的前端网站➸js中实现页面跳转(返回前一页、后一页) 一:JS 重载页面,本地刷新,返回上一页 复制代码代码如下: <a href="javascript:history.go(-1)">返回上一页</a> <a h...

前端老手
昨天
5
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部