文档章节

Hadoop CDH4.5升级CDH5 以及NameNode和YARN HA实战

China_OS
 China_OS
发布于 2014/05/27 10:47
字数 1479
阅读 194
收藏 2

CDH5支持很多新特性,所以打算把当前的CDH4.5升级到CDH5,软件部署还是以之前的CDH4.5集群为基础

192.168.1.10    U-1  (Active) hadoop-yarn-resourcemanager  hadoop-hdfs-namenode hadoop-mapreduce-historyserver hadoop-yarn-proxyserver  hadoop-hdfs-zkfc
192.168.1.20    U-2  hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce  journalnode  zookeeper  zookeeper-server
192.168.1.30    U-3  hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce  journalnode  zookeeper  zookeeper-server
192.168.1.40    U-4  hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce  journalnode  zookeeper  zookeeper-server
192.168.1.50    U-5  hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
192.168.1.70    U-7  (Standby) hadoop-yarn-resourcemanager  hadoop-hdfs-namenode  hadoop-hdfs-zkfc
注意:因为我们是升级CDH4.5到CDH5,所以上表并没有列出来所有要安装的软件,因为在CDH4.5的时候已经安装了一些,所以上面列出的软件只是你升级的时候需要重新安装的。


操作过程如下:

1    Back Up Configuration Data and Stop Services

        1    namenode进入safe mode,保存fsimage

su - hdfs
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace

        2    停止集群中的各种hadoop服务

for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done

2    Back up the HDFS Metadata

        1    找到dfs.namenode.name.dir

grep -C1 name.dir /etc/hadoop/conf/hdfs-site.xml

        2    备份dfs.namenode.name.dir指定的目录

tar czvf dfs.namenode.name.dir.tgz /data

3    Uninstall the CDH 4 Version of Hadoop

        1    卸载hadoop组件

apt-get remove bigtop-utils bigtop-jsvc bigtop-tomcat sqoop2-client hue-common

        2    删除CDH4的repository files

mv /etc/apt/sources.list.d/cloudera-cdh4.list /root/

4    Download the Latest Version of CDH 5

        1    下载CDH5的repository

wget 'http://archive.cloudera.com/cdh5/one-click-install/precise/amd64/cdh5-repository_1.0_all.deb'

        2    安装CDH5的repository

dpkg -i cdh5-repository_1.0_all.deb 
curl -s http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key |  apt-key add -

5    Install CDH 5 with YARN

        1    安装zookeeper

        2    在各个主机上安装相关组件

                1    Resource Manager host

apt-get install hadoop-yarn-resourcemanager

                2    NameNode host(s)

apt-get install hadoop-hdfs-namenode

                3    All cluster hosts except the Resource Manager

apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

                4    One host in the cluster(Active NameNode)

apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

                5    All client hosts

apt-get install hadoop-client

6    Install CDH 5 with MRv1

        因为CDH5已经主推YARN了,所以我们不再使用MRv1,就不安装了。

7    In an HA Deployment, Upgrade and Start the Journal Nodes

        1    安装journal nodes

apt-get install hadoop-hdfs-journalnode

        2    启动journal node

service hadoop-hdfs-journalnode start

8    Upgrade the HDFS Metadata

        HA模式和NON-HA模式的升级方式不一样,因为我们之前的CDH4.5是HA模式的,所以我们就按照HA模式的来升级

        1    在active namenode上执行

service hadoop-hdfs-namenode upgrade

        2    重启standby namenode

su - hdfs
hdfs namenode -bootstrapStandby
service hadoop-hdfs-namenode start

        3    启动datanode

service hadoop-hdfs-datanode start

        4    查看版本


9    Start YARN

        1    创建相关目录

su - hdfs
hadoop fs -mkdir /user/history
hdfs fs -chmod -R 1777 /user/history
hdfs fs -chown yarn /user/history
hdfs fs -mkdir /var/log/hadoop-yarn
hdfs fs -chown yarn:mapred /var/log/hadoop-yarn
hadoop fs -ls -R /

        2    在各个hadoop集群集群上启动相关服务

service hadoop-yarn-resourcemanager start
service hadoop-yarn-nodemanager start
service hadoop-mapreduce-historyserver start

10   配置NameNode的HA配置

        1     NameNode HA和CDH4.5的部署一样,只是要把yarn-site.xml中的mapreduce.shuffle修改为mapreduce_shuffle即可。

        2    验证


11    配置YARN的HA配置

        1    Stop all YARN daemons

service hadoop-yarn-nodemanager stop
service hadoop-yarn-resourcemanager stop
service hadoop-mapreduce-historyserver stop

        2    Update the configuration used by the ResourceManagers, NodeManagers and clients

                以下是U-1上的配置,core-site.xml、hdfs-site.xml、mapred-site.xml三个文件都不需要做修改,唯一要修改的是yarn-site.xml

                core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster/</value>
  </property>

  <property>
    <name>ha.zookeeper.quorum</name>
    <value>U-2:2181,U-3:2181,U-4:2181</value>
  </property>

</configuration>

                hdfs-site.xml

<configuration>
  <property>
     <name>dfs.permissions.superusergroup</name>
     <value>hadoop</value>
  </property>

  <property>
     <name>dfs.namenode.name.dir</name>
     <value>/data</value>
  </property>

  <property>
     <name>dfs.datanode.data.dir</name>
     <value>/data01,/data02</value>
  </property>

  <property>
     <name>dfs.nameservices</name>
     <value>mycluster</value>
  </property>

<!--  HA Config  -->
  <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>U-1,U-7</value>
  </property>

  <property>
      <name>dfs.namenode.rpc-address.mycluster.U-1</name>
      <value>U-1:8020</value>
  </property>

  <property>
      <name>dfs.namenode.rpc-address.mycluster.U-7</name>
      <value>U-7:8020</value>
  </property>

  <property>
      <name>dfs.namenode.http-address.mycluster.U-1</name>
      <value>U-1:50070</value>
  </property>

  <property>
      <name>dfs.namenode.http-address.mycluster.U-7</name>
      <value>U-7:50070</value>
  </property>

  <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://U-2:8485;U-3:8485;U-4:8485/mycluster</value>
  </property>

  <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/jdata</value>
  </property>

  <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>

  <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
  </property>

  <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value>
  </property>

  <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
  </property>

</configuration>

                mapred-site.xml

<configuration>
 
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>

<property>
 <name>mapreduce.jobhistory.address</name>
 <value>U-1:10020</value>
</property>
<property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>U-1:19888</value>
</property>

</configuration>

                yarn-site.xml

<configuration>
<!-- Resource Manager Configs -->
  <property>
    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    <value>2000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yarn-rm-cluster</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>U-1,U-7</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.id</name>
    <value>U-1</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>


  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>U-2:2181,U-3:2181,U-4:2181</value>
  </property>

  <property>
    <name>yarn.resourcemanager.zk.state-store.address</name>
    <value>U-1:2181</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
    <value>5000</value>
  </property>

  <!-- RM1 configs -->
  <property>
    <name>yarn.resourcemanager.address.U-1</name>
    <value>U-1:23140</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.U-1</name>
    <value>U-1:23130</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.U-1</name>
    <value>U-1:23189</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.U-1</name>
    <value>U-1:23188</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.U-1</name>
    <value>U-1:23125</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.U-1</name>
    <value>U-1:23141</value>
  </property>

  <!-- RM2 configs -->
  <property>
    <name>yarn.resourcemanager.address.U-7</name>
    <value>U-7:23140</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.U-7</name>
    <value>U-7:23130</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.U-7</name>
    <value>U-7:23189</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.U-7</name>
    <value>U-7:23188</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.U-7</name>
    <value>U-7:23125</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.U-7</name>
    <value>U-7:23141</value>
  </property>

<!-- Node Manager Configs -->
  <property>
    <description>Address where the localizer IPC is.</description>
    <name>yarn.nodemanager.localizer.address</name>
    <value>0.0.0.0:23344</value>
  </property>
  <property>
    <description>NM Webapp address.</description>
    <name>yarn.nodemanager.webapp.address</name>
    <value>0.0.0.0:23999</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/yarn/local</value>
  </property>
  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/yarn/log</value>
  </property>
  <property>
    <name>mapreduce.shuffle.port</name>
    <value>23080</value>
  </property>
</configuration>

            注意:在把yarn-site.xml拷贝到U-7后,需要把U-7上的yarn-site.xml的yarn.resourcemanager.ha.id的值修改为U-7,否则ResourceManager启动不了。 

        3    Start all YARN daemons

service hadoop-yarn-resourcemanager start
service hadoop-yarn-nodemanager start

        4    验证


                我勒个去的,这是啥问题,没有找到相应的ZKFC地址?



今天再次实验YARN的HA机制,发现官方的邮件列表有如下解释:

Right now, RM HA does not use ZKFC. So, we can not use this command “yarn rmadmin -failover
rm1 rm2” now.



If you use the default HA configuration, you set up a Automatic RM HA. In order to failover
manually,  you have two options:

set up manual RM HA by set the configuration “yarn.resourcemanager.ha.automatic-failover.enable”
as false. Then you can use command “yarn rmadmin –transitionToActive rm1”, “yarn rmadmin
–transitionToStandby rm2” to control which rm goes to active by yourself.
If you really want to experiment the manual failover when automatic failover enabled, you
can use command “yarn rmadmin –transitionToActive --forcemanual rm2"
Thanks
        原来是我的姿势不对....


        参考:https://issues.apache.org/jira/browse/YARN-3006

                 https://issues.apache.org/jira/browse/YARN-1177



© 著作权归作者所有

共有 人打赏支持
China_OS
粉丝 412
博文 451
码字总数 510295
作品 0
静安
技术主管
Hadoop YARN单点故障解决方案(HA)介绍

在Apache Hadoop 2.0的第一个稳定版本2.2.0中,资源管理系统YARN存在单点故障,且尚未解决。YARN ResourceManage HA的相关jira为YARN-149,目前正在火热开发中,但尚未公布将来的发布版本。由...

蓝狐乐队
2014/05/12
0
0
hadoop-spark-hive-hbase配置相关说明

1. zookeeper 配置 cp app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo_sample.cfg app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.......

雪童子
2015/10/10
0
0
hadoop2.x启动停止的命令

一、启动,按启动顺序执行命令。如果需要关闭集群,则按反顺序执行即可。 1.如果使用hdfs HA,需先启动zookeeper集群,具体请查看zookeeper的相关命令。 2.如果使用hdfs HA,需格式化zookeep...

cjun1990
2015/07/09
0
0
Hadoop 2.0中单点故障解决方案总结

项目构建 Hadoop 1.0内核主要由两个分支组成:MapReduce和HDFS,众所周知,这两个系统的设计缺陷是单点故障,即MR的JobTracker和HDFS的NameNode两个核心服务均存在单点问题,该问题在很长时间...

jackwxh
06/29
0
0
Hadoop集群升级HA高可用集群配置

在原有的Hadoop集群的基础上,配置HA高可用集群,简单修改配置文件,具体配置及启动方案如下:(先停止Hadoop集群) 修改core-site.xml(指定ZK集群,使ZK能够监督master的运行,启动备用节点...

kakaluoteyy
04/16
0
0

没有更多内容

加载失败,请刷新页面

加载更多

误删除innodb ibdata数据文件恢复

今天在群里看到有人说不熟悉innodb把ibdata(数据文件)和ib_logfile(事务日志)文件误删除了。不知道怎么解决。当时我也不知道怎么办。后来查阅相关资料。终找到解决方法。其实恢复也挺简单...

IT--小哥
29分钟前
1
0
常见设计模式UML图

常见设计模式UML图 本文主要总结常见的设计模式的UML图,方便查阅和思考。 创建型模式 简单工厂模式、工厂方法模式、抽象工厂模式、建造者模式和单例模式,这五种设计模式主要处理对象的创建...

陶小陶
46分钟前
1
0
分布式缓存架构设计

零、 题记 在高并发场景下,需要通过缓存来减少数据库的压力,使得大量的访问进来能够命中缓存,只有少量的需要到数据库层。由于缓存基于内存,可支持的并发量远远大于基于硬盘的数据库。所以...

Ala6
48分钟前
2
0
简单工厂模式

简单工厂模式是属于创建型模式,又叫做静态工厂方法(Static Factory Method)模式,但不属于23种GOF设计模式之一。 简单工厂模式的实质是由一个工厂类根据传入的参数,动态决定应该创建哪一...

NinjaFrog
50分钟前
1
0
git(一) 基本操作(branch、tag、冲突)

layout: blog istop: true title: "git基本操作(branch、tag、冲突)" date: 2018-09-11 category: 版本控制 tags: - 版本控制 撤销操作 修改最后一次提交 解释:修改上次提交。可以修改内容...

开心的哈士奇
53分钟前
1
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部