Linux下Hadoop安装(集群)

原创
2017/01/07 23:46
阅读数 690

1、下载hadoop-2.7.3.tar.gz

2、安装Hadoop前,请确保JDK1.7+已安装

3、配置hosts

vi /etc/hosts
#追加
192.168.241.130 master
192.168.241.131 slave1
192.168.241.132 slave2

4、设置免密码登录

#生成秘钥
	ssh-keygen -t rsa #回车三次,(或ssh-keygen -t rsa -P '')
#拷贝秘钥
	方式1:
		ssh-copy-id -i ~/.ssh/id_rsa.pub hostname
	方式2:
		scp ~/.ssh/id_rsa.pub username@hostname:~/.ssh/hostname
		cat hostname >> authorized_keys
		chmod 600 authorized_keys
		rm -rf hostname

5、关闭防火墙

    centos6.x

service iptables stop

    centos7.x

systemctl stop firewalld.service

6、解压hadoop到/app

mkdir /app && cd /app
tar -zxvf hadoop-2.7.3.tar.gz
mv hadoop-2.7.3 hadoop

7、配置Hadoop环境变量

vi /etc/profile
#追加:
    HADOOP_HOME=/app/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
source /etc/profile

8、创建文件夹

mkdir -p /var/app/hadoop/tmp
mkdir -p /var/app/hadoop/dfs/name
mkdir -p /var/app/hadoop/dfs/data

9、修改配置文件

    1. 修改core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/var/app/hadoop/tmp</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>65536</value>
    </property>
</configuration>

    2. 修改hdfs-site.xml

<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/var/app/hadoop/dfs/name</value>
        <description>namenode存储目录</description>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/var/app/hadoop/dfs/data</value>
        <description>datanode存储目录</description>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
        <description>副本个数,配置默认是3,应小于datanode机器数量</description>
    </property>
</configuration>

    3. 将mapred-site.xml.template复制一份为mapred-site.xml,并修改mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

    4. 修改yarn-site.xml

<configuration>
    <!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
</configuration>

    5. 修改slaves

slave1
slave2

10、修改与环境相关配置

    1. 修改hadoop-env.sh

# The java implementation to use.
JAVA_HOME=/app/jdk1.8.0_111
export JAVA_HOME=${JAVA_HOME}

    2. 修改yarn-env.sh

# some Java parameters
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/app/jdk1.8.0_111

11、将已配置好的hadoop文件目录拷贝到slave对应的机器/app目录下

scp -r /app/hadoop root@slave1:/app
scp -r /app/hadoop root@slave2:/app
#同时配置hosts以及profile

12、格式化namenode

bin/hdfs namenode -format

13、启动hadoop

    1. 启动namenode和datanode守护进程

sbin/start-dfs.sh

    2. 启动ResourceManager和NodeManager守护进程

sbin/start-yarn.sh

    3、可用sbin/start-all.sh上述两个命令

14、关闭hadoop

sbin/stop-dfs.sh
sbin/stop-yarn.sh

15、验证hadoop是否按照成功

    1 . master执行jps,出现以下信息

            

        slave执行jps,出现以下信息

            

    2. 打开浏览器,输入网址:http://master:50070

    3. 打开浏览器,输入网址:http://master:8088

16、hadoop基本操作

#查看hadoop根目录
hdfs dfs -ls /

17、若将secondary namenode单独配置到一台节点,如s3,则修改hdfs-site.xml

<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/var/app/hadoop/dfs/name</value>
        <description>namenode存储目录, 逗号分隔配置多个</description>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/var/app/hadoop/dfs/data</value>
        <description>datanode存储目录, 逗号分隔配置多个</description>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
        <description>副本个数,配置默认是3,应小于datanode机器数量</description>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>s3:50090</value>
        <description>若将secondary配置到s3机器上,则需添加此属性配置</description>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>/var/app/hadoop/dfs/namesecondary</value>
        <description>若配置secondary,则需添加此属性配置</description>
    </property>
    <property>
        <name>fs.checkpoint.period</name>
        <value>60</value>
        <description>The number of seconds between two periodic checkpoints. 
		   若配置secondary,则需添加此属性配置
	  </description>
    </property>
    <property>
        <name>fs.checkpoint.size</name>
        <value>10240</value>
        <description>The size of the current edit log (in bytes) that triggers 
		   a periodic checkpoint even if the fs.checkpoint.period hasn't expired. 
		   若配置secondary,则需添加此属性配置
	  </description>
    </property>
</configuration>

18、常见问题

17/03/07 01:03:52 INFO mapreduce.Job: Task Id : attempt_1488819157168_0003_m_000000_2, Status : FAILED
Container [pid=4291,containerID=container_1488819157168_0003_01_000004] is running beyond virtual memory limits. Current usage: 53.8 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

解决方案:在yarn-site.xml添加下面一行
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>


org.apache.hadoop.security.AccessControlException: Permission denied: user=xxx, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
解决办法(四种):
1.修改当前操作系统的用户名为linux文件系统中的用户名(此种方法最low)
2.通过 haddop fs -chmod 777 ... 修改文件的权限为任何人可读可写(不安全很low)
3.显示设定环境变量: 比较靠谱的方式:System.setProperty("HADOOP_USER_NAME", "root"); 当然还可以通过run configuration 的方式进行指定jvm环境功能变量‘-DHADOOP_USER_NAME=root’
4.在hdfs-site.xml添加下面一行
  <property>
      <name>dfs.permissions</name>
      <value>false</value>
  </property>


2017-03-13 21:20:34,417 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: s0/192.168.137.130:9000
2017-03-13 21:20:40,424 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s0/192.168.137.130:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
50070端口无数据节点显示,localhost.localdomain不能修改为本机的名称
解决方案:
修改/etc/hosts里面的值如下:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6


  

展开阅读全文
打赏
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部