文档章节

Hive 各版本关键新特性(Key New Feature)介绍

大数据之路
 大数据之路
发布于 2014/06/04 02:56
字数 2077
阅读 7660
收藏 22

开源世界里的代码受社区推动和极客文化的影响,变化一直都很快。这点在 hadoop 生态圈里表现尤为突出,不过这也与 hadoop 得到业界的广泛应用以及各种需求推动密不可分(近几年大数据、云计算被炒烂的节奏 哈哈~)。生态圈里各个组件各种 bug、改进、新特性满天飞,刚看到下面某同学整理的 hadoop 版本变迁图之后,感觉也有必要整理下 hive 的新特性演进史,以备忘。

1、Hive 0.8.0

添加 Bitmap Indexes、TIMESTAMP datatype、Plugin Developer Kit、JDBC Driver Improvements 等新特性

该版本年代久远了,就不详述了~

具体请参考:http://blog.cloudera.com/blog/2011/11/coming-attractions-apache-hive-0-8-0/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12316178

2、Hive 0.9.0

1. 支持CREATE OR REPLACE VIEW
2. 增加错误提示
3. 支持NOT IN 和 NOT LIKE
4. Ctrl+c将会提交kill命令,kill掉当前运行的query job,并且不会退出hive cli
5. 输出map数和reduce数
6. 提升"select xx,xx from xxx LIMIT xxx"性能
7. 支持BETWEEN操作
8. PRINTF()函数
9. COALESCE/UNION ALL操作时候对数据类型宽限
10. 增加TIMESTAMP数据类型
11. 增加"INSERT OVERWRITE TABLE X PARTITION (a=b, c=d) IF NOT EXISTS ..."操作,如果分区存在,则不会动.
12. 提升hive任务提交后任务编译和启动的性能。
具体请参考:Whats new in Apache Hive 0.9.0

https://cwiki.apache.org/confluence/download/attachments/27362054/WhatsNewInHive090HadoopSummit2012BoF.pdf?version=1&modificationDate=1339872131000

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12317742

3、Hive 0.10.0

Cube and Rollup: Hive now has support for creating cubes with rollups. Thanks to Namit!

List Bucketing: This is an optimization that lets you better handle skew in your tables. Thanks to Gang!

Better Windows Support: Several Hive 0.10.0 fixes support running Hive natively on Windows. There is no more cygwin dependency. Thanks to Kanna!

Explain’ Adds More Info: Now you can do an explain dependency and the explain plan will contain all the tables and partitions touched upon by the query. Thanks to Sambavi!

Improved Authorization: The metastore can now optionally do authorization checks on the server side instead of on the client, providing you with a better security profile. Thanks to Sushanth!

Faster Simple Queries: Some simple queries that don’t require aggregations, and therefore MapReduce jobs, can now run faster.Thanks to Navis!

Better YARN Support: This release contains additional work aimed at making Hive work well with Hadoop YARN. While not all test cases are passing yet, there has been a lot of good progress made with this release. Thanks to Zhenxiao!

Union Optimization: Hive queries with unions will now result in a lower number of MapReduce jobs under certain conditions. Thanks to Namit!

Undo Your Drop Table: While not really truly ‘undo’, you can now reinstate your table after dropping it. Thanks to Andrew!

Show Create Table: The lets you see how you created your table. Thanks to Feng!

Support for Avro Data: Hive now has built-in support for reading/writing Avro data. Thanks to Jakob!

Skewed Joins: Hive’s support for joins involving skewed data is now improved. Thanks to Namit!

Robust Connection Handling at the Metastore Layer: Connection handling between a metastore client and server  and also between a metastore server and the database layer has been improved. Thanks to Bhushan and Jean!

More Statistics: Its now possible to collect and store scalar-valued statistics for your tables and partitions. This will enable better query planning in upcoming releases. Thanks to Shreepadma!

Better-Looking HWI : HWI now uses a bootstrap javascript library. It looks really slick.

具体请参考: http://zh.hortonworks.com/blog/apache-hive-0-10-0-is-now-available/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12320745&styleName=Text&projectId=12310843

https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup

4、Hive 0.11.0

  • ORCFile.  It’s Optimized.
    The ORC File (Optimized RC File) presents key new features that speed access of data Apache Hive as it adds meta information at the file and block data level so that queries can be more intelligent and use meta data to optimize access.  Further, with the ORC file, only the bytes from the required columns are read from HDFS which minimizes I/O and speeds the query chain.  These are major advances for improved performance in Hive.

  • Improved Data Types
    As Apache Hive marches towards full SQL-compatibility, an update to the decimal data type was made more usable.

  • Analytic Functions
    Hive 0.11 introduces windowing functions for RANK, LEAD/LAG, ROW_NUMBER, FIRST_VALUE, LAST_VALUE and more. It also introduces aggregate OVER functions with PARTITION BY and ORDER BY

  • Joins improved in Hive 0.11
    Both the broadcast join and the SMB join were improved considerably in Hive 0.11.  Both joins work without user hints, so that the Hive optimizer now picks the correct join rather than depending on the user to do so. More broadcast joins are now packed into a single MapReduce job, making star join queries much more efficient.

  • Implement HiveServer2

  • when output hive table to file,users should could have a separator of their own choice

具体请参考:http://zh.hortonworks.com/blog/apache-hive-0-11-stinger-phase-1-delivered/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323587&styleName=Text&projectId=12310843

5、Hive 0.12.0

Hive12deux

具体请参考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-12/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12324312&styleName=Text&projectId=12310843

6、Hive 0.13.0

hivesidebar

具体请参考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-13-completion-stinger-initiative/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12324986&styleName=Text&projectId=12310843

7、Hive 0.14.0

[HIVE-5317] - Implement insert, update, and delete in Hive with full ACID support

[HIVE-5775] - Introduce Cost Based Optimizer to Hive

[HIVE-5823] - Support for DECIMAL primitive type in AvroSerDe

[HIVE-6455] - Scalable dynamic partitioning and bucketing optimization

[HIVE-6469] - skipTrash option in hive command line

[HIVE-6806] - CREATE TABLE should support STORED AS AVRO

[HIVE-7036] - get_json_object bug when extract list of list with index

[HIVE-7054] - Support ELT UDF in vectorized mode

[HIVE-7068] - Integrate AccumuloStorageHandler

[HIVE-7090] - Support session-level temporary tables in Hive

[HIVE-7158] - Use Tez auto-parallelism in Hive

[HIVE-7203] - Optimize limit 0

[HIVE-7255] - Allow partial partition spec in analyze command

[HIVE-7299] - Enable metadata only optimization on Tez

[HIVE-7341] - Support for Table replication across HCatalog instances

[HIVE-7390] - Make single quote character optional and configurable in BeeLine CSV/TSV output

[HIVE-7416] - provide context information to authorization checkPrivileges api call

[HIVE-7430] - Implement SMB join in tez

[HIVE-7446] - Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

[HIVE-7506] - MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

[HIVE-7509] - Fast stripe level merging for ORC

[HIVE-7547] - Add ipAddress and userName to ExecHook

[HIVE-7587] - Fetch aggregated stats from MetaStore

[HIVE-7654] - A method to extrapolate columnStats for partitions of a table

[HIVE-7826] - Dynamic partition pruning on Tez

[HIVE-8531] - Fold is not null filter if there are other comparison filter present on same column

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12326450&styleName=Text&projectId=12310843

8、hive 1.0

该版本无新特性

9、hive 1.1

[HIVE-3405] - UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase

[HIVE-7122] - Storage format for create like table

[HIVE-8435] - Add identity project remover optimization

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&styleName=Text&version=12329363

10、hive 1.2

[HIVE-7998] - Enhance JDBC Driver to not require class specification

[HIVE-9039] - Support Union Distinct

[HIVE-9188] - BloomFilter support in ORC

[HIVE-9277] - Hybrid Hybrid Grace Hash Join

[HIVE-9302] - Beeline add commands to register local jdbc driver names and jars

[HIVE-9780] - Add another level of explain for RDBMS audience

[HIVE-10038] - Add Calcite's ProjectMergeRule.

[HIVE-10099] - Enable constant folding for Decimal

[HIVE-10591] - Support limited integer type promotion in ORC

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345&styleName=Text&projectId=12310843

11、Hive 2.0

  • [HIVE-686] - add UDF substring_index

  • [HIVE-3404] - Create quarter UDF

  • [HIVE-7926] - long-lived daemons for query fragment execution, I/O and caching

  • [HIVE-10591] - Support limited integer type promotion in ORC

  • [HIVE-10592] - ORC file dump in JSON format

  • [HIVE-10673] - Dynamically partitioned hash join for Tez

  • [HIVE-10761] - Create codahale-based metrics system for Hive

  • [HIVE-10785] - Support aggregate push down through joins

  • [HIVE-11103] - Add banker's rounding BROUND UDF

  • [HIVE-11461] - Transform flat AND/OR into IN struct clause

  • [HIVE-11488] - Add sessionId and queryId info to HS2 log

  • [HIVE-11593] - Add aes_encrypt and aes_decrypt UDFs

  • [HIVE-11600] - Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

  • [HIVE-11684] - Implement limit pushdown through outer join in CBO

  • [HIVE-11699] - Support special characters in quoted table names

  • [HIVE-11706] - Implement "show create database"

  • [HIVE-11775] - Implement limit push down through union all in CBO

  • [HIVE-11785] - Support escaping carriage return and new line for LazySimpleSerDe

  • [HIVE-11976] - Extend CBO rules to being able to apply rules only once on a given operator

  • [HIVE-12080] - Support auto type widening (int->bigint & float->double) for Parquet table

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332641&styleName=Text&projectId=12310843

Refer:

[1] hive0.80, 0.90新特性  http://superlxw1234.iteye.com/blog/1564461

[2] hive 0.10 0.11新增特性综述  http://blog.csdn.net/lalaguozhe/article/details/11730817

[3] http://hive.apache.org/downloads.html

[4] Hive未来两年的路线图  http://www.infoq.com/cn/news/2014/09/hive

(1)支持ACID事务——用户将可以插入、更新和删除现有数据。Hive将由传统的一次写入、频繁读取的系统发展为一个支持变化数据分析的系统。
(2)实现亚秒级查询——用户可以将Hive用于像交互式仪表板和探究性分析这样对响应时间有更高要求的应用场景。
(3)全面支持SQL:2011 Analytics——用户可以使用标准SQL在Hive上部署复杂的报表,而且更快捷、更简便、更可靠。而基于成本的、功能强大的优化器可以确保工具生成的查询和复杂查询的运行速度。届时,Hive将在Hadoop上提供企业级SQL用户所享有的全部表达能力。它将在支持窗口函数、用户自定义函数、子查询、Rollup、Cube、标准聚集、内连接、外连接、半连接和交叉连接的基础上,增加对不等连接、集合函数(并、交、差)、时间间隔类型等的支持。
Stinger.next计划用时18个月,将分三个阶段交付。事务支持将于2014年底发布,亚秒级查询将在2015年上半年推出,而对SQL:2011 Analytics的全面支持则将于2015年底完成。
此外,Hive还将与机器学习框架Spark集成,使用户可以通过Hive运行机器学习模型。

© 著作权归作者所有

大数据之路
粉丝 1605
博文 514
码字总数 333086
作品 0
武汉
架构师
私信 提问
加载中

评论(3)

Jacle
Jacle
很好!
l
littlel
牛,好清晰
刘伟
总结的好,每个版本的新特性一目了然。
Oracle SQL Developer 4.1 发布

Oracle SQL Developer 4.1.0.19.07发布,新特性: 在4.1版本中,修复了超过600个bug, 同时也包括了若干关键的功能增强,这里是完整的列表:here。 Easily Copy or Move your 12c pluggable d...

李玉珏
2015/05/21
5.2K
4
阿里数据处理平台 - Blink

Blink 最初是阿里巴巴内部的 Apache Flink 版本代号,基于阿里的场景做了大量的优化和稳定性改造工作。在经过一些讨论之后,我们决定将Blink的所有代码捐赠给Flink社区,并将其开源从而成为F...

匿名
2013/04/09
18.2K
0
MaxCompute2.0新功能介绍

摘要: 在过去的两年内,MaxCompute进行了翻天覆地的重构,从1.0版本全面升级到了2.0版本。而大家或许对于MaxCompute 2.0的一些新特性并不了解,在本文中,MaxCompute技术专家秋鹏就为大家详...

xx暖忆
2018/10/24
0
0
Apache Spark 2.2.0正式发布

关于 Apache Spark 2.2.0 的详细新功能介绍请参见:《Apache Spark 2.2.0新特性详细介绍》 Apache Spark 2.2.0 持续了半年的开发,从RC1 到 RC6 终于在今天正式发布了。本版本是 2.x 版本线的...

Spark
2017/07/12
0
0
WKWebView强大的新特性

iOS11对WKWebView的功能进一步完善,新增如下功能: Manager Cookies Fileter unwanted content Provide custom resources 下面是对各个特性的简单介绍,详细可参见源码。 1.Manager Cookie...

zhanggui
2018/01/10
0
0

没有更多内容

加载失败,请刷新页面

加载更多

Qt程序打包发布方法(使用官方提供的windeployqt工具)

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。 本文链接:https://blog.csdn.net/toTheUnknown/article/details/81748179 如果使用到了Qt ...

shzwork
25分钟前
4
0
MainThreadSupport

MainThreadSupport EventBus 3.0 中的代码片段. org.greenrobot.eventbus.MainThreadSupport 定义一个接口,并给出默认实现类. 调用者可以在EventBus的构建者中替换该实现. public interface ...

马湖村第九后羿
46分钟前
3
0
指定要使用的形状来代替文字的显示

控制手机键盘弹出的功能只能在ios上实现,安卓是实现不了的,所以安卓只能使用type类型来控制键盘类型,例如你要弹出数字键盘就使用type="number",如果要弹出电话键盘就使用type="tel",但这...

前端老手
56分钟前
6
0
总结:Raft协议

一、Raft协议是什么? 分布式一致性算法。即解决分布式系统中各个副本数据一致性问题。 二、Raft的日志广播过程 发送日志到所有Followers(Raft中将非Leader节点称为Follower)。 Followers收...

浮躁的码农
今天
7
0
Flask-admin Model View字段介绍

Model View字段介绍 can_create = True 是否可以创建can_edit = True 是否可以编辑can_delete = True 是否可以删除list_template = 'admin/model/list.html' 修改显......

dillonxiao
今天
5
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部