文档章节

hadoop应用程序实例distributedshell

mskk
 mskk
发布于 2017/05/04 21:19
字数 1018
阅读 6
收藏 0

本文介绍YARN自带的一个非常简单的应用程序实例—distributedshell的使用方法。它可以看做YARN编程中的“hello world”,主要功能是并行执行用户提供的shell命令或者shell脚本。

 

(1)运行参数介绍

DistributedShell的基本运行参数如下:

(2)运行方法

DistributedShell的运行方法如下:

在YARN安装目录下,执行以下命令:

bin/hadoop jar\

share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.1.jar\

org.apache.hadoop.yarn.applications.distributedshell.Client\

–jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.1.jar\

–shell_command ls\

–shell_script ignore.sh\

–num_containers 10\

–container_memory 350\

–master_memory 350\

–priority 10

需要注意的是,在hadoop-2.0.3-alpha(不包括该版本)和CDH 4.1.2版本(包括该版本)之前,DistributedShell存在BUG,具体如下:

1)    必须使用–shell_command参数

2)    当只有shell_command参数而没有shell_script参数时,在分布式模式下(伪分布式下可以)不能执行成功,具体说明和修复方法见:https://issues.apache.org/jira/browse/YARN-253,一直临时的解决方案是同时设置shell_command和shell_script两个参数,像上面给出的这个实例一样。

在这个实例中,ignore.sh中的内容就是“ls”

3)    内存设置一定要正确,不然会出现以下提示的错误:

Container [pid=4424,containerID=container_1359629844156_0004_01_000001] is running beyond virtual memory limits. Current usage: 90.1mb of 128.0mb physical memory used; 593.0mb of 268.8mb virtual memory used. Killing container.

【附】DistributedShell运行日志:

13/02/01 13:43:11 INFO distributedshell.Client: Initializing Client

13/02/01 13:43:11 INFO distributedshell.Client: Starting Client

13/02/01 13:43:11 INFO distributedshell.Client: Connecting to ResourceManager at c2-23/10.1.1.98:8032

13/02/01 13:43:12 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=3

13/02/01 13:43:12 INFO distributedshell.Client: Got Cluster node info from ASM

13/02/01 13:43:12 INFO distributedshell.Client: Got node report from ASM for, nodeId=c2-23:36594, nodeAddressc2-23:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: “”, last_health_report_time: 1359697377337,

13/02/01 13:43:12 INFO distributedshell.Client: Got node report from ASM for, nodeId=c2-25:41070, nodeAddressc2-25:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: “”, last_health_report_time: 1359697367180,

13/02/01 13:43:12 INFO distributedshell.Client: Got node report from ASM for, nodeId=c2-24:48383, nodeAddressc2-24:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: “”, last_health_report_time: 1359699033102,

13/02/01 13:43:12 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0

13/02/01 13:43:12 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS

13/02/01 13:43:12 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE

13/02/01 13:43:12 INFO distributedshell.Client: Got new application id=application_1359695803957_0003

13/02/01 13:43:12 INFO distributedshell.Client: Min mem capabililty of resources in this cluster 128

13/02/01 13:43:12 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 10240

13/02/01 13:43:12 INFO distributedshell.Client: Setting up application submission context for ASM

13/02/01 13:43:12 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment

13/02/01 13:43:13 INFO distributedshell.Client: Set the environment for the application master

13/02/01 13:43:13 INFO distributedshell.Client: Trying to generate classpath for app master from current thread’s classpath

13/02/01 13:43:13 INFO distributedshell.Client: Readable bytes from stream=9006

13/02/01 13:43:13 INFO distributedshell.Client: Setting up app master command

13/02/01 13:43:13 INFO distributedshell.Client: Completed setting up app master command ${JAVA_HOME}/bin/java -Xmx350m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster –container_memory 350 –num_containers 10 –priority 0 –shell_command ls 1>/AppMaster.stdout 2>/AppMaster.stderr

13/02/01 13:43:13 INFO distributedshell.Client: Submitting application to ASM

13/02/01 13:43:14 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1359697393467, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=c2-23:8088/proxy/application_1359695803957_0003/, appUser=rmss

13/02/01 13:43:15 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1359697393467, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=rmss

13/02/01 13:43:16 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1359697393467, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=rmss

13/02/01 13:43:17 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1359697393467, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=rmss

13/02/01 13:43:18 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1359697393467, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=rmss

13/02/01 13:43:19 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1359697393467, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=, appUser=rmss

13/02/01 13:43:19 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop

13/02/01 13:43:19 INFO distributedshell.Client: Application completed successfully

原创文章,转载请注明: 转载自董的博客

本文链接地址: http://dongxicheng.org/mapreduce-nextgen/how-to-run-distributedshell/

作者:Dong,作者介绍:http://dongxicheng.org/about/

本博客的文章集合:http://dongxicheng.org/recommend/

本文转载自:http://gaylord.iteye.com/blog/2083668

共有 人打赏支持
mskk
粉丝 3
博文 161
码字总数 3597
作品 0
昆山
程序员
私信 提问
Hadoop 2.5.0编译到Apache Hadoop Common失败

[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main ................................. SUCCES......

尧雪
2018/04/19
89
1
Bluemix中的Apache Spark数据分析服务入门

Spark是一个基于内存计算的开源的集群计算系统,目的是让数据分析更加快速。Spark非常小巧玲珑,由加州伯克利大学AMP实验室的Matei为主的小团队所开发。使用的语言是Scala,项目的core部分的...

微wx笑
2016/05/22
0
0
Docker 将 Hadoop 带到云端

一周前我们发布并开源了Cloudbreak--首个基于hadoop的docker service API。本文将为您展示其技术细节和架构组成。 Cloudbreak 建立于Apache Ambari, Docker containers, Serf 和 dnsmasq ...

oschina
2014/07/28
4.8K
0
大数据教程(8.1)mapreduce核心思想

上一章介绍了hadoop的HDFS文件系统的原理及API使用。本章博主将继续对hadoop的mapreduce编程框架进行分享。 mapreduce原理篇 mapreduce是一个分布式运算程序的编程框架,是用户开发“基于had...

em_aaron
2018/11/19
0
0
分布式计算 MapReduce与yarn工作机制

一、第一代hadoop组成与结构 第一代Hadoop,由分布式存储系统HDFS和分布式计算框架MapReduce组成,其中,HDFS由一个NameNode和多个DataNode组成,MapReduce由一个JobTracker和多个TaskTrack...

南非蚂蚁
2016/11/07
0
0

没有更多内容

加载失败,请刷新页面

加载更多

Coding and Paper Letter(四十五)

资源整理。 1 Coding: 1.Python库gempy,一种基于Python的开源三维结构地质建模软件,它允许从界面和方向数据隐式(即自动)创建复杂的地质模型。 它还支持随机建模以解决参数和模型的不确定...

胖胖雕
44分钟前
1
0
golang 声明一个指定长度的数组,用于后续添加

很多时候我们需要声明一个指定长度的数组,用于后续添加.在使用go的时候要注意,下面的第一个例子会有报错 "non-constant array bound",应该使用第二个例子. Length 是动态的值 有报错的例子 ...

漫步海边小路
46分钟前
0
0
Java NIO示例

Server端 /** * 《构建高性能的大型分布式Java应用》 * 书中的示例代码 * 版权所有 2008---2009 */package book.chapter1.tcpnio;import java.net.InetSocketAddress;i...

月下狼
52分钟前
0
0
发布xxl-job executor dotnet core 执行器的实现

DotXxlJob [github][https://github.com/xuanye/DotXxlJob] xxl-job的dotnet core 执行器实现,支持XXL-JOB 2.0+ 1 XXL-JOB概述 [XXL-JOB][1]是一个轻量级分布式任务调度平台,其核心设计目标...

假正经哥哥
今天
5
0
mysql 查询当天、本周,本月,上一个月的数据

今天 select * from 表名 where to_days(时间字段名) = to_days(now()); 昨天 SELECT * FROM 表名 WHERE TO_DAYS( NOW( ) ) - TO_DAYS( 时间字段名) <= 1 近7天 SELECT * FROM 表名 wher......

BraveLN
今天
5
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部