文档章节

大数据教程(11.7)hadoop2.9.1平台上仓库工具hive1.2.2搭建

em_aaron
 em_aaron
发布于 01/21 01:03
字数 1468
阅读 20
收藏 0

   上一篇文章介绍了hive2.3.4的搭建,然而这个版本已经不能稳定的支持mapreduce程序。本篇博主将分享hive1.2.2工具搭建全过程。先说明:本节就直接在上一节的hadoop环境中搭建了!

    一、下载apache-hive-1.2.2-bin.tar.gz

    二、上传hive包到namenode服务器

    三、解压hive包

tar -zxvf  apache-hive-1.2.2-bin.tar.gz  -C /home/hadoop/apps/

    四、修改/etc/profile中hive的配置文件

#export HIVE_HOME=/home/hadoop/apps/apache-hive-2.3.4-bin
export HIVE_HOME=/home/hadoop/apps/apache-hive-1.2.2-bin
export PATH=${HIVE_HOME}/bin:$PATH

#保存后执行source /etc/profile生效

    五、修改hive配置文件

cd /home/hadoop/apps/apache-hive-1.2.2-bin/conf/
cp hive-env.sh.template hive-env.sh

#新增以下三行内容并保存
vi hive-env.sh
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.9.1
export HIVE_CONF_DIR=/home/hadoop/apps/apache-hive-1.2.2-bin/conf
export HIVE_AUX_JARS_PATH=/home/hadoop/apps/apache-hive-1.2.2-bin/lib

    六、修改log4j日志配置

cp hive-log4j.properties.template hive-log4j.properties

将EventCounter修改成org.apache.hadoop.log.metrics.EventCounter
#log4j.appender.EventCounter=org.apache.hadoop.hive.shims.HiveEventCounter
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

    七、配置hive元数据库mysql

vi hive-site.xml
#将以下信息写入到hive-site.xml文件中
<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://192.168.29.131:3306/hivedb?createDatabaseIfNotExist=true</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>123456</value>
        </property>
</configuration>

    八、将jdbc驱动类拷贝到hive的lib目录下

cp ~/mysql-connector-java-5.1.28.jar  $HIVE_HOME/lib/

    九、删除之前hive2.3.4在hdfs中留下的历史文件(此步骤一定要做)

 hdfs dfs -rm -r /tmp/hive
 hdfs dfs -rm -r /user/hive

    十、初始化hive

[hadoop@centos-aaron-h1 bin]$ schematool -initSchema -dbType mysql
Metastore connection URL:        jdbc:mysql://192.168.29.131:3306/hivedb?createDatabaseIfNotExist=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
Starting metastore schema initialization to 1.2.0
Initialization script hive-schema-1.2.0.mysql.sql
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***
[hadoop@centos-aaron-h1 bin]$ schematool -initSchema -dbType mysql
Metastore connection URL:        jdbc:mysql://192.168.29.131:3306/hivedb?createDatabaseIfNotExist=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
Starting metastore schema initialization to 1.2.0
Initialization script hive-schema-1.2.0.mysql.sql
Initialization script completed
schemaTool completed

    十一、启动hive并且完成建库、建表,数据上传

#此句需要在建库建表做好才执行
hdfs dfs -put bbb_hive.txt /user/hive/warehouse/wcc_log.db/t_web_log01
[hadoop@centos-aaron-h1 bin]$ hive

Logging initialized using configuration in file:/home/hadoop/apps/apache-hive-1.2.2-bin/conf/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.679 seconds, Fetched: 1 row(s)
hive>  create database wcc_log;
OK
Time taken: 0.104 seconds
hive> use wcc_log;
OK
Time taken: 0.03 seconds
hive> create table t_web_log01(id int,name string)
    > row format delimited
    > fields terminated by ',';
OK
Time taken: 0.159 seconds
hive> select * from t_web_log01;
OK
1       张三
2       李四
3       王二
4       麻子
5       隔壁老王
Time taken: 0.274 seconds, Fetched: 5 row(s)
hive> select count(*) from t_web_log01;
Query ID = hadoop_20190121080409_dfb157d9-0a79-4784-9ea4-111d0ad4cc92
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548024929599_0003, Tracking URL = http://centos-aaron-h1:8088/proxy/application_1548024929599_0003/
Kill Command = /home/hadoop/apps/hadoop-2.9.1/bin/hadoop job  -kill job_1548024929599_0003
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2019-01-21 08:04:25,271 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1548024929599_0003 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
hive> select count(id) from t_web_log01;
Query ID = hadoop_20190121080455_b3eb8d25-2d10-46c6-b4f3-bfcdab904b92
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548024929599_0004, Tracking URL = http://centos-aaron-h1:8088/proxy/application_1548024929599_0004/
Kill Command = /home/hadoop/apps/hadoop-2.9.1/bin/hadoop job  -kill job_1548024929599_0004
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2019-01-21 08:05:09,771 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1548024929599_0004 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

执行报错FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

    十一、解决上面的报错

              查询yarn日志:http://centos-aaron-h1:8088/proxy/application_1548024929599_0004/

分析原因:hive远程去调用yarn后,或出现一些环境变量丢失的情况;

解决方案:修改mapred-site.xml 新增下面内容,并且分发到所有hadoop集群,并重启集群

<property>
<name>mapreduce.application.classpath</name>
<value>/home/hadoop/apps/hadoop-2.9.1/share/hadoop/mapreduce/*, /home/hadoop/apps/hadoop-2.9.1/share/hadoop/mapreduce/lib/*</value>
</property>

    十二、再次运行hive查询【select count(id) from t_web_log01;】

[hadoop@centos-aaron-h1 bin]$ hive

Logging initialized using configuration in file:/home/hadoop/apps/apache-hive-1.2.2-bin/conf/hive-log4j.properties
hive> use wcc_log
    > ;
OK
Time taken: 0.487 seconds
hive> show tables;
OK
t_web_log01
Time taken: 0.219 seconds, Fetched: 1 row(s)
hive> select count(id) from t_web_log01;
Query ID = hadoop_20190121082042_c5392e1c-8db8-4329-bcdf-b0c332fcfe4f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548029911300_0001, Tracking URL = http://centos-aaron-h1:8088/proxy/application_1548029911300_0001/
Kill Command = /home/hadoop/apps/hadoop-2.9.1/bin/hadoop job  -kill job_1548029911300_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-01-21 08:21:05,410 Stage-1 map = 0%,  reduce = 0%
2019-01-21 08:21:14,072 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.38 sec
2019-01-21 08:21:21,290 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.32 sec
MapReduce Total cumulative CPU time: 3 seconds 320 msec
Ended Job = job_1548029911300_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.32 sec   HDFS Read: 6642 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 320 msec
OK
5
Time taken: 40.218 seconds, Fetched: 1 row(s)
hive> [hadoop@centos-aaron-h1 bin]$

    从上面可以看到执行成功,结果:5条记录

    最后寄语,以上是博主本次文章的全部内容,如果大家觉得博主的文章还不错,请点赞;如果您对博主其它服务器大数据技术或者博主本人感兴趣,请关注博主博客,并且欢迎随时跟博主沟通交流。

© 著作权归作者所有

共有 人打赏支持
em_aaron
粉丝 75
博文 111
码字总数 195178
作品 3
黄浦
高级程序员
私信 提问
大数据教程(11.6)hadoop2.9.1平台上仓库工具hive2.3.4搭建

上一篇文章介绍了hive的原理以及实现机。本篇博客开始,博主将分享数据仓库hive工具搭建全过程。 一、安装Hive (1)、下载Hive和环境准备: Hive官网地址:http://hive.apache.org/index.html...

em_aaron
01/20
0
0
大数据教程(6.4)centos6.9安装hadoop2.9.1

在上一篇博客,博主为大家讲述了centos6.9上编译hadoop2.9.1全过程,这一章节博主将带着大家一起来安装hadoop。 本篇博客中可能用到的额外的命令: 一、首先上传编译好的hadoop2.9.1.tar.gz(...

em_aaron
2018/10/31
0
0
大数据教程(11.4)hadoop2.9.1集群HA联邦(federation)高可用搭建

上一篇文章介绍了haoop集群HA高可用的搭建,相信大家已经掌握了其知识;本篇博客博主将继续为小伙伴分享HA联邦高可用的搭建,虽然,联邦机制在很多公司可能还达不到这样的数据集群规模以至于...

em_aaron
01/13
0
0
大数据,大数据平台,大数据价值

第一部分:什么是大数据,大数据有什么特征 (注:本文根据小讲“企业大数据战略及价值变现”中的“什么是大数据”章节的分享整理而成) 对于大数据,我想不管你是否行业内人士,在这高度信息...

王礼Leon
2017/09/15
0
0
有赞大数据实践: 敏捷型数据仓库的构建及其应用

前言 互联网公司一般发展迅速. 一方面, 业务飞速发展, 当前应用的形式和模型每天都在变化; 企业的产品也在经历不断的下线上线过程. 数据仓库如何拥抱变化, 是难点之一. 互联网的运营人员从了...

大数据之路
2012/07/28
0
0

没有更多内容

加载失败,请刷新页面

加载更多

php register_globals将接收参数转为全局变量

最近在看公司旧的系统的时候发现一个很奇怪的事情,很多页面用的变量找不到源头,没有定义也不是接收,意思是腾空出现的。 经排查,原来是php配置做的好事:register_globals = On。registe...

shikamaru
14分钟前
5
0
Linux 交换分区swap

一、创建和启用swap交换区 如果你的服务器的总是报告内存不足,并且时常因为内存不足而引发服务被强制kill的话,在不增加物理内存的情况下,启用swap交换区作为虚拟内存是一个不错的选择,我...

Yue_Chen
16分钟前
0
0
notepad++如何使用列块编辑模式?

notepad++如何使用列块编辑模式? 听语音 | 浏览:18584 | 更新:2015-12-22 10:56 | 标签:软件 1 2 3 4 5 6 7 分步阅读 notepad++是一款功能强大的文本编辑器,可以支持各种不同的文本类型...

linjin200
17分钟前
0
0
Java 基础语法

一个Java程序可以认为是一系列对象的集合,而这些对象通过调用彼此的方法来协同工作。下面简要介绍下类、对象、方法和实例变量的概念。 对象:对象是类的一个实例,有状态和行为。例如,一条...

二九结狐六体
21分钟前
0
0
研发团队资源成本优化实践

背景 工程师主要面对的是技术挑战,更关注技术层面的目标。研发团队的管理者则会把实现项目成果和业务需求作为核心目标。实际项目中,研发团队所需资源(比如物理机器、内存、硬盘、网络带宽...

美团技术团队
26分钟前
0
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部