文档章节

Running Hadoop MapReduce on Tachyon

Ryan-瑞恩
 Ryan-瑞恩
发布于 2015/04/08 17:42
字数 735
阅读 36
收藏 0

This guide describes how to get Tachyon running with Hadoop MapReduce, so that you can easily use your MapReduce programs with files stored on Tachyon.

Prerequisites

The prerequisite for this part is that you have Java. We also assume that you have set up Tachyon and Hadoop in accordance to these guidesLocal Mode or Cluster Mode

If running a Hadoop 1.x cluster, ensure that the hadoop/conf/core-site.xml file in your Hadoop installation’s conf directory has the following properties added:

<property>
  <name>fs.tachyon.impl</name>
  <value>tachyon.hadoop.TFS</value>
</property>
<property>
  <name>fs.tachyon-ft.impl</name>
  <value>tachyon.hadoop.TFSFT</value>
</property>

This will allow your MapReduce jobs to use Tachyon for their input and output files. If you are using HDFS as the underlying store for Tachyon, it may be necessary to add these properties to the hdfs-site.xml conf file as well.

If the cluster is a 2.x cluster, then these properties are not needed.

Distributing Tachyon Executables

In order for the MapReduce job to be able to use files via Tachyon, we will need to distribute the Tachyon jar amongst all the nodes in the cluster. This will allow the TaskTracker and JobClient to have all the requisite executables to interface with Tachyon.

We are presented with three options that for distributing the jars as outlined by this guide from Cloudera.

Assuming that Tachyon will be used prominently, it is best to ensure that the Tachyon jar will permanently reside on each node, so that we do not rely on the Hadoop DistributedCache to avoid the network costs of distributing the jar for each job (Option 1), and don’t significantly increase our job jar size by packaging Tachyon with it (Option 2). For this reason, of the three options laid out, it is highly recommended to consider the third route, by installing the Tachyon jar on each node.

  • For installing Tachyon on each node, you must place the tachyon-client-0.6.3-jar-with-dependencies.jar, located in thetachyon/client/target directory, in the $HADOOP_HOME/lib directory of each node, and then restart all of the TaskTrackers. One downfall of this approach is that the jars must be installed again for each update to a new release.

  • You can also run a job by using the -libjars command line option when using hadoop jar..., and specifying/pathToTachyon/core/target/tachyon-client=0.6.3-jar-with-dependencies.jar as the argument. This will place the jar in the Hadoop DistributedCache, and is desirable only if you are updating the Tachyon jar a non-trivial number of times.

  • For those interested in the second option, please revisit the Cloudera guide for more assistance. One must simply package the Tachyon jar in the lib subdirectory of the job jar. This option is the most undesirable since for every change in Tachyon, we must recreate the job jar, thereby incurring a network cost for every job by increasing the size of the job jar.

In order to make the Tachyon executables available to the JobClient, one can also install the Tachyon jar in the $HADOOP_HOME/lib directory, or modify HADOOP_CLASSPATH by changing hadoop-env.sh to:

$ export HADOOP_CLASSPATH=/pathToTachyon/client/target/tachyon-client-0.6.3-jar-with-dependencies.jar

This will allow the code that creates the Job and submits it to reference Tachyon if necessary.

Example

For simplicity, we will assume a psuedo-distributed Hadoop cluster.

$ cd $HADOOP_HOME
$ ./bin/stop-all.sh
$ ./bin/start-all.sh

Because we have a psuedo-distributed cluster, copying the Tachyon jar into $HADOOP_HOME/lib makes the Tachyon executables available to both the TaskTrackers and the JobClient. We can now verify that it is working by the following:

$ cd $HADOOP_HOME
$ ./bin/hadoop jar hadoop-examples-1.0.4.jar wordcount -libjars /pathToTachyon/client/target/tachyon-client-0.6.3-jar-with-dependencies.jar tachyon://localhost:19998/X tachyon://localhost:19998/X-wc

Where X is some file on Tachyon and, the results of the wordcount job is in the X-wc directory.

For example, say you have text files in HDFS directory /user/hduser/gutenberg/. You can run the following:

$ cd $HADOOP_HOME
$ ./bin/hadoop jar hadoop-examples-1.0.4.jar wordcount -libjars /pathToTachyon/client/target/tachyon-client-0.6.3-jar-with-dependencies.jar tachyon://localhost:19998/user/hduser/gutenberg tachyon://localhost:19998/user/hduser/output

The above command tell the wordcount to load the files from HDFS directory /user/hduser/gutenberg/ into Tachyon and then save the output result to /user/hduser/output/ in Tachyon.


© 著作权归作者所有

共有 人打赏支持
Ryan-瑞恩

Ryan-瑞恩

粉丝 152
博文 236
码字总数 182615
作品 0
西安
后端工程师
私信 提问
A16Z 750万美元投资分布式文件系统 Tachyon

据 WSJ消息,Tachyon日前获得了硅谷风投 A16Z 750 万美元 A 轮投资。A16Z 的普通合伙人 Peter Levine 加入 Tachyon 董事会。 根据该项目官网的介绍,Tachyon 是一个以内存为中心的分布式文件...

oschina
2015/03/19
2.2K
3
Apache Hadoop 2.2.0 稳定版发布

Apache Hadoop 2.2.0 稳定版发布了,建议用户升级。该版本更加稳定,同时在 API 和协议上兼容老的版本。 与 Hadoop 1.x 比较,该版本显著的改进包括: YARN - A general purpose resource ma...

潞邊壹仦貓
2013/10/16
12.9K
22
大数据之---Yarn伪分布式部署和MapReduce案例

1、软件环境 本次涉及伪分布式部署只是要主机hadoop01,软件安装参考伪分布式部署终极篇 2、配置yarn和mapreduce 3、提交测试jar计算圆周率 job15248048138350001 job命名格式: jobunix时间...

ycwyong
2018/05/17
0
0
Hadoop 2.5.0编译到Apache Hadoop Common失败

[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main ................................. SUCCES......

尧雪
2018/04/19
89
1
2016大数据唯一完整版教程

大数据教程(2016版) 链接: https://pan.baidu.com/s/1qXT9WwG 密码: qrht 2015-12-22_linux 2015-12-24_linux 2015-12-29_MapReduce 2015-12-31_MapReduce 2016-01-05_MapReduce 2016-01-0......

大象分享
2017/02/15
1K
0

没有更多内容

加载失败,请刷新页面

加载更多

java框架学习日志-13(Mybatis基本概念和简单的例子)

在mybatis初次学习Mybatis的时候,遇到了很多问题,虽然阿里云的视频有教学,但是视频教学所使用的软件和我自己使用的软件不用,我自己用的数据库是oracle数据库,开发环境是idea。而且视频中...

白话
今天
3
0
Java基础:String、StringBuffer和StringBuilder的区别

1 String String:字符串常量,字符串长度不可变。Java中String是immutable(不可变)的。 String类的包含如下定义: /** The value is used for character storage. */private final cha...

watermelon11
今天
2
0
mogodb服务

部署MongoDB 官网: https://www.mongodb.com/download-center/community 创建mongo数据目录 mkdir /data/mongodb 二进制部署 wget -c https://fastdl.mongodb.org/linux/mongodb-linux-x8......

以谁为师
昨天
5
0
大神教你Debian GNU/Linux 9.7 “Stretch” Live和安装镜像开放下载

Debian项目团队于昨天发布了Debian GNU/Linux 9 "Stretch" 的第7个维护版本更新,重点修复了APT软件管理器中存在的安全漏洞。在敦促每位用户尽快升级系统的同时,Debian团队还发布了Debian ...

linux-tao
昨天
4
0
PHP 相关配置

1. php-fpm的pool 编辑php-fpm配置文件php-fpm.con vim /usr/local/php/etc/php-fpm.conf //在[global]部分增加以下内容 include = etc/php-fpm.d/*.conf # 相当与Nginx的虚拟主机文件 “vho......

Yue_Chen
昨天
2
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部