Hive 各版本关键新特性(Key New Feature)介绍

2014/06/04 02:56
阅读数 2.1W

开源世界里的代码受社区推动和极客文化的影响,变化一直都很快。这点在 hadoop 生态圈里表现尤为突出,不过这也与 hadoop 得到业界的广泛应用以及各种需求推动密不可分(近几年大数据、云计算被炒烂的节奏 哈哈~)。生态圈里各个组件各种 bug、改进、新特性满天飞,刚看到下面某同学整理的 hadoop 版本变迁图之后,感觉也有必要整理下 hive 的新特性演进史,以备忘。

1、Hive 0.8.0

添加 Bitmap Indexes、TIMESTAMP datatype、Plugin Developer Kit、JDBC Driver Improvements 等新特性



2、Hive 0.9.0

2. 增加错误提示
4. Ctrl+c将会提交kill命令,kill掉当前运行的query job,并且不会退出hive cli
5. 输出map数和reduce数
6. 提升"select xx,xx from xxx LIMIT xxx"性能
7. 支持BETWEEN操作
8. PRINTF()函数
9. COALESCE/UNION ALL操作时候对数据类型宽限
10. 增加TIMESTAMP数据类型
11. 增加"INSERT OVERWRITE TABLE X PARTITION (a=b, c=d) IF NOT EXISTS ..."操作,如果分区存在,则不会动.
12. 提升hive任务提交后任务编译和启动的性能。
具体请参考:Whats new in Apache Hive 0.9.0

3、Hive 0.10.0

Cube and Rollup: Hive now has support for creating cubes with rollups. Thanks to Namit!

List Bucketing: This is an optimization that lets you better handle skew in your tables. Thanks to Gang!

Better Windows Support: Several Hive 0.10.0 fixes support running Hive natively on Windows. There is no more cygwin dependency. Thanks to Kanna!

Explain’ Adds More Info: Now you can do an explain dependency and the explain plan will contain all the tables and partitions touched upon by the query. Thanks to Sambavi!

Improved Authorization: The metastore can now optionally do authorization checks on the server side instead of on the client, providing you with a better security profile. Thanks to Sushanth!

Faster Simple Queries: Some simple queries that don’t require aggregations, and therefore MapReduce jobs, can now run faster.Thanks to Navis!

Better YARN Support: This release contains additional work aimed at making Hive work well with Hadoop YARN. While not all test cases are passing yet, there has been a lot of good progress made with this release. Thanks to Zhenxiao!

Union Optimization: Hive queries with unions will now result in a lower number of MapReduce jobs under certain conditions. Thanks to Namit!

Undo Your Drop Table: While not really truly ‘undo’, you can now reinstate your table after dropping it. Thanks to Andrew!

Show Create Table: The lets you see how you created your table. Thanks to Feng!

Support for Avro Data: Hive now has built-in support for reading/writing Avro data. Thanks to Jakob!

Skewed Joins: Hive’s support for joins involving skewed data is now improved. Thanks to Namit!

Robust Connection Handling at the Metastore Layer: Connection handling between a metastore client and server  and also between a metastore server and the database layer has been improved. Thanks to Bhushan and Jean!

More Statistics: Its now possible to collect and store scalar-valued statistics for your tables and partitions. This will enable better query planning in upcoming releases. Thanks to Shreepadma!

Better-Looking HWI : HWI now uses a bootstrap javascript library. It looks really slick.


4、Hive 0.11.0

  • ORCFile.  It’s Optimized.
    The ORC File (Optimized RC File) presents key new features that speed access of data Apache Hive as it adds meta information at the file and block data level so that queries can be more intelligent and use meta data to optimize access.  Further, with the ORC file, only the bytes from the required columns are read from HDFS which minimizes I/O and speeds the query chain.  These are major advances for improved performance in Hive.

  • Improved Data Types
    As Apache Hive marches towards full SQL-compatibility, an update to the decimal data type was made more usable.

  • Analytic Functions
    Hive 0.11 introduces windowing functions for RANK, LEAD/LAG, ROW_NUMBER, FIRST_VALUE, LAST_VALUE and more. It also introduces aggregate OVER functions with PARTITION BY and ORDER BY

  • Joins improved in Hive 0.11
    Both the broadcast join and the SMB join were improved considerably in Hive 0.11.  Both joins work without user hints, so that the Hive optimizer now picks the correct join rather than depending on the user to do so. More broadcast joins are now packed into a single MapReduce job, making star join queries much more efficient.

  • Implement HiveServer2

  • when output hive table to file,users should could have a separator of their own choice


5、Hive 0.12.0



6、Hive 0.13.0



7、Hive 0.14.0

[HIVE-5317] - Implement insert, update, and delete in Hive with full ACID support

[HIVE-5775] - Introduce Cost Based Optimizer to Hive

[HIVE-5823] - Support for DECIMAL primitive type in AvroSerDe

[HIVE-6455] - Scalable dynamic partitioning and bucketing optimization

[HIVE-6469] - skipTrash option in hive command line

[HIVE-6806] - CREATE TABLE should support STORED AS AVRO

[HIVE-7036] - get_json_object bug when extract list of list with index

[HIVE-7054] - Support ELT UDF in vectorized mode

[HIVE-7068] - Integrate AccumuloStorageHandler

[HIVE-7090] - Support session-level temporary tables in Hive

[HIVE-7158] - Use Tez auto-parallelism in Hive

[HIVE-7203] - Optimize limit 0

[HIVE-7255] - Allow partial partition spec in analyze command

[HIVE-7299] - Enable metadata only optimization on Tez

[HIVE-7341] - Support for Table replication across HCatalog instances

[HIVE-7390] - Make single quote character optional and configurable in BeeLine CSV/TSV output

[HIVE-7416] - provide context information to authorization checkPrivileges api call

[HIVE-7430] - Implement SMB join in tez

[HIVE-7446] - Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

[HIVE-7506] - MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

[HIVE-7509] - Fast stripe level merging for ORC

[HIVE-7547] - Add ipAddress and userName to ExecHook

[HIVE-7587] - Fetch aggregated stats from MetaStore

[HIVE-7654] - A method to extrapolate columnStats for partitions of a table

[HIVE-7826] - Dynamic partition pruning on Tez

[HIVE-8531] - Fold is not null filter if there are other comparison filter present on same column

8、hive 1.0


9、hive 1.1

[HIVE-3405] - UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase

[HIVE-7122] - Storage format for create like table

[HIVE-8435] - Add identity project remover optimization

10、hive 1.2

[HIVE-7998] - Enhance JDBC Driver to not require class specification

[HIVE-9039] - Support Union Distinct

[HIVE-9188] - BloomFilter support in ORC

[HIVE-9277] - Hybrid Hybrid Grace Hash Join

[HIVE-9302] - Beeline add commands to register local jdbc driver names and jars

[HIVE-9780] - Add another level of explain for RDBMS audience

[HIVE-10038] - Add Calcite's ProjectMergeRule.

[HIVE-10099] - Enable constant folding for Decimal

[HIVE-10591] - Support limited integer type promotion in ORC

11、Hive 2.0

  • [HIVE-686] - add UDF substring_index

  • [HIVE-3404] - Create quarter UDF

  • [HIVE-7926] - long-lived daemons for query fragment execution, I/O and caching

  • [HIVE-10591] - Support limited integer type promotion in ORC

  • [HIVE-10592] - ORC file dump in JSON format

  • [HIVE-10673] - Dynamically partitioned hash join for Tez

  • [HIVE-10761] - Create codahale-based metrics system for Hive

  • [HIVE-10785] - Support aggregate push down through joins

  • [HIVE-11103] - Add banker's rounding BROUND UDF

  • [HIVE-11461] - Transform flat AND/OR into IN struct clause

  • [HIVE-11488] - Add sessionId and queryId info to HS2 log

  • [HIVE-11593] - Add aes_encrypt and aes_decrypt UDFs

  • [HIVE-11600] - Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

  • [HIVE-11684] - Implement limit pushdown through outer join in CBO

  • [HIVE-11699] - Support special characters in quoted table names

  • [HIVE-11706] - Implement "show create database"

  • [HIVE-11775] - Implement limit push down through union all in CBO

  • [HIVE-11785] - Support escaping carriage return and new line for LazySimpleSerDe

  • [HIVE-11976] - Extend CBO rules to being able to apply rules only once on a given operator

  • [HIVE-12080] - Support auto type widening (int->bigint & float->double) for Parquet table


[1] hive0.80, 0.90新特性

[2] hive 0.10 0.11新增特性综述


[4] Hive未来两年的路线图

(3)全面支持SQL:2011 Analytics——用户可以使用标准SQL在Hive上部署复杂的报表,而且更快捷、更简便、更可靠。而基于成本的、功能强大的优化器可以确保工具生成的查询和复杂查询的运行速度。届时,Hive将在Hadoop上提供企业级SQL用户所享有的全部表达能力。它将在支持窗口函数、用户自定义函数、子查询、Rollup、Cube、标准聚集、内连接、外连接、半连接和交叉连接的基础上,增加对不等连接、集合函数(并、交、差)、时间间隔类型等的支持。
Stinger.next计划用时18个月,将分三个阶段交付。事务支持将于2014年底发布,亚秒级查询将在2015年上半年推出,而对SQL:2011 Analytics的全面支持则将于2015年底完成。

22 收藏
2016/06/20 14:26
2015/06/17 15:51
2015/05/18 11:38
3 评论
22 收藏