Hive 各版本关键新特性(Key New Feature)介绍

原创
2014/06/04 02:56
阅读数 1.9W

开源世界里的代码受社区推动和极客文化的影响,变化一直都很快。这点在 hadoop 生态圈里表现尤为突出,不过这也与 hadoop 得到业界的广泛应用以及各种需求推动密不可分(近几年大数据、云计算被炒烂的节奏 哈哈~)。生态圈里各个组件各种 bug、改进、新特性满天飞,刚看到下面某同学整理的 hadoop 版本变迁图之后,感觉也有必要整理下 hive 的新特性演进史,以备忘。

1、Hive 0.8.0

添加 Bitmap Indexes、TIMESTAMP datatype、Plugin Developer Kit、JDBC Driver Improvements 等新特性

该版本年代久远了,就不详述了~

具体请参考:http://blog.cloudera.com/blog/2011/11/coming-attractions-apache-hive-0-8-0/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12316178

2、Hive 0.9.0

1. 支持CREATE OR REPLACE VIEW
2. 增加错误提示
3. 支持NOT IN 和 NOT LIKE
4. Ctrl+c将会提交kill命令,kill掉当前运行的query job,并且不会退出hive cli
5. 输出map数和reduce数
6. 提升"select xx,xx from xxx LIMIT xxx"性能
7. 支持BETWEEN操作
8. PRINTF()函数
9. COALESCE/UNION ALL操作时候对数据类型宽限
10. 增加TIMESTAMP数据类型
11. 增加"INSERT OVERWRITE TABLE X PARTITION (a=b, c=d) IF NOT EXISTS ..."操作,如果分区存在,则不会动.
12. 提升hive任务提交后任务编译和启动的性能。
具体请参考:Whats new in Apache Hive 0.9.0

https://cwiki.apache.org/confluence/download/attachments/27362054/WhatsNewInHive090HadoopSummit2012BoF.pdf?version=1&modificationDate=1339872131000

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12317742

3、Hive 0.10.0

Cube and Rollup: Hive now has support for creating cubes with rollups. Thanks to Namit!

List Bucketing: This is an optimization that lets you better handle skew in your tables. Thanks to Gang!

Better Windows Support: Several Hive 0.10.0 fixes support running Hive natively on Windows. There is no more cygwin dependency. Thanks to Kanna!

Explain’ Adds More Info: Now you can do an explain dependency and the explain plan will contain all the tables and partitions touched upon by the query. Thanks to Sambavi!

Improved Authorization: The metastore can now optionally do authorization checks on the server side instead of on the client, providing you with a better security profile. Thanks to Sushanth!

Faster Simple Queries: Some simple queries that don’t require aggregations, and therefore MapReduce jobs, can now run faster.Thanks to Navis!

Better YARN Support: This release contains additional work aimed at making Hive work well with Hadoop YARN. While not all test cases are passing yet, there has been a lot of good progress made with this release. Thanks to Zhenxiao!

Union Optimization: Hive queries with unions will now result in a lower number of MapReduce jobs under certain conditions. Thanks to Namit!

Undo Your Drop Table: While not really truly ‘undo’, you can now reinstate your table after dropping it. Thanks to Andrew!

Show Create Table: The lets you see how you created your table. Thanks to Feng!

Support for Avro Data: Hive now has built-in support for reading/writing Avro data. Thanks to Jakob!

Skewed Joins: Hive’s support for joins involving skewed data is now improved. Thanks to Namit!

Robust Connection Handling at the Metastore Layer: Connection handling between a metastore client and server  and also between a metastore server and the database layer has been improved. Thanks to Bhushan and Jean!

More Statistics: Its now possible to collect and store scalar-valued statistics for your tables and partitions. This will enable better query planning in upcoming releases. Thanks to Shreepadma!

Better-Looking HWI : HWI now uses a bootstrap javascript library. It looks really slick.

具体请参考: http://zh.hortonworks.com/blog/apache-hive-0-10-0-is-now-available/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12320745&styleName=Text&projectId=12310843

https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup

4、Hive 0.11.0

  • ORCFile.  It’s Optimized.
    The ORC File (Optimized RC File) presents key new features that speed access of data Apache Hive as it adds meta information at the file and block data level so that queries can be more intelligent and use meta data to optimize access.  Further, with the ORC file, only the bytes from the required columns are read from HDFS which minimizes I/O and speeds the query chain.  These are major advances for improved performance in Hive.

  • Improved Data Types
    As Apache Hive marches towards full SQL-compatibility, an update to the decimal data type was made more usable.

  • Analytic Functions
    Hive 0.11 introduces windowing functions for RANK, LEAD/LAG, ROW_NUMBER, FIRST_VALUE, LAST_VALUE and more. It also introduces aggregate OVER functions with PARTITION BY and ORDER BY

  • Joins improved in Hive 0.11
    Both the broadcast join and the SMB join were improved considerably in Hive 0.11.  Both joins work without user hints, so that the Hive optimizer now picks the correct join rather than depending on the user to do so. More broadcast joins are now packed into a single MapReduce job, making star join queries much more efficient.

  • Implement HiveServer2

  • when output hive table to file,users should could have a separator of their own choice

具体请参考:http://zh.hortonworks.com/blog/apache-hive-0-11-stinger-phase-1-delivered/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323587&styleName=Text&projectId=12310843

5、Hive 0.12.0

Hive12deux

具体请参考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-12/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12324312&styleName=Text&projectId=12310843

6、Hive 0.13.0

hivesidebar

具体请参考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-13-completion-stinger-initiative/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12324986&styleName=Text&projectId=12310843

7、Hive 0.14.0

[HIVE-5317] - Implement insert, update, and delete in Hive with full ACID support

[HIVE-5775] - Introduce Cost Based Optimizer to Hive

[HIVE-5823] - Support for DECIMAL primitive type in AvroSerDe

[HIVE-6455] - Scalable dynamic partitioning and bucketing optimization

[HIVE-6469] - skipTrash option in hive command line

[HIVE-6806] - CREATE TABLE should support STORED AS AVRO

[HIVE-7036] - get_json_object bug when extract list of list with index

[HIVE-7054] - Support ELT UDF in vectorized mode

[HIVE-7068] - Integrate AccumuloStorageHandler

[HIVE-7090] - Support session-level temporary tables in Hive

[HIVE-7158] - Use Tez auto-parallelism in Hive

[HIVE-7203] - Optimize limit 0

[HIVE-7255] - Allow partial partition spec in analyze command

[HIVE-7299] - Enable metadata only optimization on Tez

[HIVE-7341] - Support for Table replication across HCatalog instances

[HIVE-7390] - Make single quote character optional and configurable in BeeLine CSV/TSV output

[HIVE-7416] - provide context information to authorization checkPrivileges api call

[HIVE-7430] - Implement SMB join in tez

[HIVE-7446] - Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

[HIVE-7506] - MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

[HIVE-7509] - Fast stripe level merging for ORC

[HIVE-7547] - Add ipAddress and userName to ExecHook

[HIVE-7587] - Fetch aggregated stats from MetaStore

[HIVE-7654] - A method to extrapolate columnStats for partitions of a table

[HIVE-7826] - Dynamic partition pruning on Tez

[HIVE-8531] - Fold is not null filter if there are other comparison filter present on same column

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12326450&styleName=Text&projectId=12310843

8、hive 1.0

该版本无新特性

9、hive 1.1

[HIVE-3405] - UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase

[HIVE-7122] - Storage format for create like table

[HIVE-8435] - Add identity project remover optimization

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&styleName=Text&version=12329363

10、hive 1.2

[HIVE-7998] - Enhance JDBC Driver to not require class specification

[HIVE-9039] - Support Union Distinct

[HIVE-9188] - BloomFilter support in ORC

[HIVE-9277] - Hybrid Hybrid Grace Hash Join

[HIVE-9302] - Beeline add commands to register local jdbc driver names and jars

[HIVE-9780] - Add another level of explain for RDBMS audience

[HIVE-10038] - Add Calcite's ProjectMergeRule.

[HIVE-10099] - Enable constant folding for Decimal

[HIVE-10591] - Support limited integer type promotion in ORC

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345&styleName=Text&projectId=12310843

11、Hive 2.0

  • [HIVE-686] - add UDF substring_index

  • [HIVE-3404] - Create quarter UDF

  • [HIVE-7926] - long-lived daemons for query fragment execution, I/O and caching

  • [HIVE-10591] - Support limited integer type promotion in ORC

  • [HIVE-10592] - ORC file dump in JSON format

  • [HIVE-10673] - Dynamically partitioned hash join for Tez

  • [HIVE-10761] - Create codahale-based metrics system for Hive

  • [HIVE-10785] - Support aggregate push down through joins

  • [HIVE-11103] - Add banker's rounding BROUND UDF

  • [HIVE-11461] - Transform flat AND/OR into IN struct clause

  • [HIVE-11488] - Add sessionId and queryId info to HS2 log

  • [HIVE-11593] - Add aes_encrypt and aes_decrypt UDFs

  • [HIVE-11600] - Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

  • [HIVE-11684] - Implement limit pushdown through outer join in CBO

  • [HIVE-11699] - Support special characters in quoted table names

  • [HIVE-11706] - Implement "show create database"

  • [HIVE-11775] - Implement limit push down through union all in CBO

  • [HIVE-11785] - Support escaping carriage return and new line for LazySimpleSerDe

  • [HIVE-11976] - Extend CBO rules to being able to apply rules only once on a given operator

  • [HIVE-12080] - Support auto type widening (int->bigint & float->double) for Parquet table

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332641&styleName=Text&projectId=12310843

Refer:

[1] hive0.80, 0.90新特性  http://superlxw1234.iteye.com/blog/1564461

[2] hive 0.10 0.11新增特性综述  http://blog.csdn.net/lalaguozhe/article/details/11730817

[3] http://hive.apache.org/downloads.html

[4] Hive未来两年的路线图  http://www.infoq.com/cn/news/2014/09/hive

(1)支持ACID事务——用户将可以插入、更新和删除现有数据。Hive将由传统的一次写入、频繁读取的系统发展为一个支持变化数据分析的系统。
(2)实现亚秒级查询——用户可以将Hive用于像交互式仪表板和探究性分析这样对响应时间有更高要求的应用场景。
(3)全面支持SQL:2011 Analytics——用户可以使用标准SQL在Hive上部署复杂的报表,而且更快捷、更简便、更可靠。而基于成本的、功能强大的优化器可以确保工具生成的查询和复杂查询的运行速度。届时,Hive将在Hadoop上提供企业级SQL用户所享有的全部表达能力。它将在支持窗口函数、用户自定义函数、子查询、Rollup、Cube、标准聚集、内连接、外连接、半连接和交叉连接的基础上,增加对不等连接、集合函数(并、交、差)、时间间隔类型等的支持。
Stinger.next计划用时18个月,将分三个阶段交付。事务支持将于2014年底发布,亚秒级查询将在2015年上半年推出,而对SQL:2011 Analytics的全面支持则将于2015年底完成。
此外,Hive还将与机器学习框架Spark集成,使用户可以通过Hive运行机器学习模型。

展开阅读全文
打赏
2
22 收藏
分享
加载中
很好!
2016/06/20 14:26
回复
举报
牛,好清晰
2015/06/17 15:51
回复
举报
总结的好,每个版本的新特性一目了然。
2015/05/18 11:38
回复
举报
更多评论
打赏
3 评论
22 收藏
2
分享
返回顶部
顶部