文档章节

Spark API编程动手实战-02-以集群模式进行Spark API实战textFile、cach

stark_summer
 stark_summer
发布于 2015/01/28 13:56
字数 1832
阅读 86
收藏 0

操作HDFS:先要保证HDFS启动了:


启动spark集群:


以spark-shell运行在spark集群上:



查看下之前上传到HDFS上的”LICENSE.txt“文件:


用spark读取这个文件:


使用count统计该文件的行数:


 我们可以看到count 耗时为0.239708s

对该RDD进行cache操作并执行count使得缓存生效:


执行count结果为:


此时耗时为0.21132s

再执行count操作:


此时耗时为0.029580s,这时因为我们自己基于cache后的数据进行操作的。

接着我们对上面的rdd进行wordcount操作:



通过saveAsTextFile把数据存到HDFS中:


我们通过web控制台查看下运行结果:


我们通过命令行看下part-00000和part-00001内容:

[spark@S1PA222 ~]$ hadoop fs -cat /data/resultLicenseWordCount/part-00000
15/01/22 13:51:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(under,10)
(Unless,3)
(Contributions),1)
(offer,1)
(agree,1)
(BUSINESS,2)
(NON-INFRINGEMENT,,1)
(its,4)
(materials,2)
(event,1)
(intentionally,2)
(Grant,2)
(writing,1)
(include,3)
(responsibility,,1)
(have,2)
(MERCHANTABILITY,,1)
(Contribution,3)
(Massachusetts,1)
(express,2)
("Your"),1)
((i),1)
(However,,1)
(been,2)
(files;,1)
(This,1)
(stating,1)
(2-Clause,1)
(conditions.,1)
(non-exclusive,,2)
(appropriateness,1)
(marked,1)
(risks,1)
(any,28)
(IS",4)
(implementation,1)
(filed.,1)
(Sections,1)
(fee,1)
(losses),,1)
(out,1)
(contract,2)
(DISTRIBUTION,1)
(4.,1)
(file,6)
(documentation,,2)
(wherever,1)
(unless,1)
(below).,1)
(names,,1)
(verbal,,1)
(ANY,10)
(version,1)
(file.,2)
(are,10)
(no-charge,,2)
(2.,1)
(from,,1)
(reproduction,,3)
(2011-2014,,1)
(assume,1)
(licenses,1)
(DATA,,2)
(IS,2)
(recommend,1)
(prominent,1)
(revisions,,1)
("[]",1)
(FITNESS,3)
(otherwise,,3)
(distribution,,1)
(necessarily,1)
(Apache,5)
(grant,1)
(CONTRIBUTORS,4)
(as,15)
(irrevocable,2)
(inclusion,2)
(purpose,2)
(products,1)
(ARE,2)
(merely,1)
(File,1)
(Definitions.,1)
(form,10)
(IMPLIED,4)
(Warranty,1)
(Patent,1)
(incurred,1)
(8.,1)
(repository,1)
(contributors,1)
("printed,1)
(sell,,2)
(:,3)
(malfunction,,1)
(Version,2)
(origin,1)
(alongside,1)
(CRC,1)
(implied.,1)
(contract,,1)
(representatives,,1)
(warranty,1)
(offer,,1)
(org.apache.hadoop.util.bloom.*,1)
(KIND,,2)
(is,10)
(conspicuously,1)
(found,1)
(charge,1)
(make,,1)
(file,,1)
(associated,1)
(even,1)
(same,1)
((Don't,1)
(outstanding,1)
(link,1)
([name,1)
(Trademarks.,1)
(notice,2)
(endorse,1)
(shall,15)
(contact,1)
(Redistributions,4)
(using,1)
(class,1)
(name),1)
(behalf,5)
(form.,1)
(We,1)
(INTERRUPTION),2)
(responsible,1)
(annotations,,1)
(THIS,4)
(subject,1)
(acting,1)
(permitted,2)
(OUT,2)
(BASIS,,2)
(has,2)
(Accepting,1)
(defend,,1)
(University,1)
([yyyy],1)
((http://www.one-lab.org),1)
(EVENT,2)
(granting,1)
(portions,1)
(implied,,1)
(NOTICE,5)
(infringed,1)
(limitation,,1)
(names,2)
(electronic,,1)
(PURPOSE,2)
(licensable,1)
(section),1)
(conditions,14)
(EVEN,2)
(acts),1)
(law,3)
(licenses.,1)
(compression,1)
(readable,1)
(solely,1)
(configuration,1)
(information.,1)
(litigation,2)
(represent,,1)
(warranty,,1)
(shares,,1)
(supersede,1)
(governed,1)
(marks,,1)
(http://code.google.com/p/lz4/,1)
(modification,,2)
(fifty,1)
(sent,1)
(places:,1)
(means,2)
(identifying,1)
(this,22)
(Works",1)
(Louvain,1)
(prior,1)
(slicing-by-8,1)
(PROCUREMENT,2)
(changed,1)
(describing,1)
(only,4)
(contributory,1)
(normally,1)
(indirect,,2)
(WITHOUT,2)
(Works,12)
(documentation,3)
(agreement,1)
(otherwise,3)
("AS,4)
(damages,,1)
(patent,,1)
(APACHE,1)
(without,6)
("NOTICE",1)
(Limitation,1)
(SUBSTITUTE,2)
(Contribution(s),3)
(Subject,2)
(Submission,1)
(UCL,1)
(TITLE,,1)
(trademarks,,1)
((iii),1)
(2.0,1)
(Fast,1)
(exercise,1)
(accepting,2)
(example,1)
(distribution.,2)
(interfaces,1)
(conditions:,1)
(act,1)
(incorporated,2)
(provides,2)
(limited,4)
(LZ4,3)
(2008,2009,2010,1)
(can,2)
(contents,1)
(PURPOSE.,1)
(recipients,1)
("Contribution",1)
(failure,1)
(communication,3)
(commercial,1)
(works,1)
(language,1)
(permissions,3)
(WARRANTIES,4)
(media,1)
(reserved.,2)
(Works,,2)
(How,1)
(WARRANTIES,,2)
(controlled,1)
(Warranty.,1)
(2.0,,1)
((http://www.opensource.org/licenses/bsd-license.php),1)
(own,4)
(submit,1)
(SHALL,2)
(reasonable,1)
(reason,1)
(agreed,3)
(systems,1)
(patent,5)
(form,,4)
(Technology.,1)
(advised,1)
(systems,,1)
(classes:,1)
(HOWEVER,2)
(distribution,3)
(DAMAGES,2)
((c),2)
(src/main/native/src/org/apache/hadoop/util:,1)
(PROFITS;,2)
(perpetual,,2)
(applies,1)
(apply,2)
(subcomponents,2)
(modify,2)
(owner],1)
(one,1)
(modifying,1)
(counterclaim,1)
(January,1)
(discussing,1)
(CONTRACT,,2)
(with,16)
((C),1)
(infringement,,1)
(2004,1)
(lawsuit),1)
(specific,2)
(LZ,1)
(warranties,1)
(reproducing,1)
(promote,1)
(beneficial,1)
(ADVISED,2)
((a),1)
(other,9)
(date,1)
(met:,2)
(publicly,2)
(from,4)
(LIMITED,4)
(display,,1)
(MERCHANTABILITY,2)
(damages,3)
(SUBCOMPONENTS:,1)
(negligence),,1)
(remain,1)
(CONDITIONS,4)
(their,2)
(electronic,1)
(identification,1)
(determining,1)
(consistent,1)
(display,1)
(writing,,3)
(trade,1)
(third-party,2)
(,1299)
(description,1)
(REPRODUCTION,,1)
(attached,1)
(list,4)
(*,34)
(INDIRECT,,2)
(designated,1)
(Contribution.",1)
(complies,1)
(addendum,1)
(damages.,1)
(Yann,1)
(EXPRESS,2)
(License;,1)
(6.,1)
(GOODS,2)
(subsequently,1)
(included,2)
(replaced,1)
(notice,,5)
[spark@S1PA222 ~]$   hadoop fs -cat /data/resultLicenseWordCount/part-00001

15/01/22 13:52:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(For,6)
(reproduce,,1)
("Contributor",1)
((or,3)
(nothing,1)
(work.,1)
(content,1)
(HOLDERS,2)
(add,2)
(through,1)
(All,2)
(perform,,1)
(result,1)
(goodwill,,1)
(herein,1)
(direct,,1)
(used,1)
(To,1)
(harmless,1)
(9.,1)
(these,1)
(control,,1)
(INCIDENTAL,,2)
(indicated,1)
(part,4)
(alone,1)
(different,1)
(forms,,2)
(purposes,4)
(https://groups.google.com/forum/#!forum/lz4c,1)
(be,7)
(/**,2)
(carry,1)
(separable,1)
(including,5)
(contained,1)
(combination,1)
(calculation,1)
(license,7)
(FOR,6)
(thereof,,2)
(ARISING,2)
(constitutes,1)
(but,5)
(types.,1)
(stated,2)
(archives.,1)
(obligations,,1)
(5.,1)
(Works;,3)
(nor,1)
("Legal,1)
(Work,20)
(whole,,2)
(Copyright,5)
(at,3)
(copyright,,1)
(Redistribution,2)
(object,1)
(copy,3)
(indemnify,,1)
(asserted,1)
(HADOOP,1)
(attach,1)
("control",1)
(support,,1)
("Object",1)
(give,1)
(THEORY,2)
(may,10)
(except,2)
("Work",1)
(sublicense,,1)
(IF,2)
(granted,2)
(project,2)
(authorized,2)
(SPECIAL,,2)
(BY,2)
(retain,2)
(or,65)
(transfer,1)
(fields,1)
(Licensor,,1)
((b),1)
((ii),1)
(2005,,1)
(of,75)
(does,1)
(transformation,1)
((INCLUDING,2)
(DIRECT,,2)
(management,1)
(modified,1)
(Licensed,1)
(percent,1)
(Header,1)
(original,2)
(Contributor,,1)
(native,1)
((INCLUDING,,2)
(PARTICULAR,3)
(limitations,1)
(THE,10)
(INCLUDING,,2)
(power,,1)
(CAUSED,2)
(de,1)
(appropriate,1)
(against,,1)
(TORT,2)
("Source",1)
(each,4)
(1.,1)
(following,10)
(Liability.,2)
(acceptance,1)
("You",1)
(sole,1)
(from),1)
(See,1)
(tracking,1)
(for,19)
(cause,2)
(alleging,1)
(obtain,1)
(reproduce,3)
(source,,1)
(control,2)
(EXEMPLARY,,2)
(TERMS,2)
(terms,8)
(syntax,1)
(SERVICES;,2)
(made,,1)
(BUT,4)
(compiled,1)
(issue,1)
("submitted",1)
(OneLab,1)
(algorithm,1)
(was,1)
(While,1)
(entity,,1)
(do,3)
(PROVIDED,2)
(no,2)
(License,10)
(entity,3)
(Contributions.,2)
(mean,10)
(individual,3)
(Institute,1)
(computer,1)
(notices,9)
(Neither,1)
(Licensor,8)
(STRICT,2)
(made,1)
(authorship,,2)
(bind,1)
((the,1)
(indemnity,,1)
(distribute,3)
(You,24)
(grants,2)
(brackets,1)
(meet,1)
(for,,1)
(service,1)
(in,31)
(trademark,,1)
(boilerplate,1)
(WAY,2)
(LOSS,2)
(distributed,3)
(LIABILITY,,4)
(submitted,2)
(public,1)
(OF,19)
(managed,1)
(derived,2)
(Source,8)
(use,,4)
(name,2)
(definition,,2)
(that,25)
(src/main/native/src/org/apache/hadoop/io/compress/lz4/{lz4.h,lz4.c,lz4hc.h,lz4hc.c},,1)
(customary,1)
(BSD,1)
(thereof,1)
(claims,2)
(CONSEQUENTIAL,2)
(translation,1)
(format.,1)
(construed,1)
(DAMAGE.,2)
(applicable,3)
(binary,4)
(regarding,1)
(European,1)
(excluding,3)
(END,1)
((d),1)
(choose,1)
(NO,2)
(BE,2)
(direct,2)
(retain,,1)
(modifications,,3)
(forum,1)
(owner,4)
(USE,2)
(informational,1)
(The,3)
(legal,1)
((50%),1)
(document.,1)
(received,1)
(such,17)
(institute,1)
(distribute,,2)
(WHETHER,2)
(page",1)
((except,1)
(loss,1)
(common,1)
(additions,1)
(BSD-style,1)
(Appendix,1)
(Use,1)
(disclaimer,2)
(resulting,1)
(ON,2)
(hereby,2)
(License.,11)
(software,3)
(whom,1)
(along,1)
(lists,,1)
(required,4)
(OR,18)
(ownership,2)
(SOFTWARE,2)
(the,122)
(includes,1)
(obligations,1)
(import,,1)
(not,11)
(either,2)
(terminate,1)
(if,4)
(stoppage,,1)
(provided,9)
(submitted.,1)
(all,3)
(permission.,1)
("License");,1)
(written,2)
(generated,2)
(consequential,1)
(Derivative,17)
(AND,11)
(rights,3)
(http://www.apache.org/licenses/,1)
(terms.,1)
(Catholique,1)
(deliberate,1)
(entity.,2)
(Work,,4)
(special,,1)
(Additional,1)
(Legal,3)
(034819,1)
(least,1)
(text,4)
(on,11)
(editorial,1)
(redistributing,2)
("License",1)
(against,1)
(permission,1)
(9,1)
(separate,2)
(and/or,3)
(LICENSE,1)
(union,1)
((and,1)
(1,1)
(including,,1)
(Entity,3)
(negligent,1)
(LIABLE,2)
(IN,6)
(use,8)
(enclosed,2)
(contains,1)
(files,1)
(Entity",1)
(Work.,1)
(owner.,1)
(preferred,1)
(modifications,3)
(brackets!),1)
(available,1)
(code,5)
(http://www.apache.org/licenses/LICENSE-2.0,1)
(more,1)
(possibility,1)
(product,1)
(liable,1)
(SUCH,2)
(direction,1)
(must,8)
(making,1)
(Disclaimer,1)
(disclaimer.,2)
(Commission,1)
(OTHERWISE),2)
(Hadoop,1)
((an,1)
(APPENDIX:,1)
("Licensor",1)
(DISCLAIMED.,2)
("Derivative,1)
(elaborations,,1)
(incidental,,1)
(prepare,1)
(A,3)
(exercising,1)
(*/,3)
(which,2)
(pertain,2)
(explicitly,1)
(tort,1)
(3.,1)
(also,1)
(conversions,1)
(liability,2)
(whether,4)
(character,1)
(should,1)
(thereof.,1)
(of,,3)
(your,4)
(royalty-free,,2)
(entities,1)
(or,,1)
(NEGLIGENCE,2)
(author,1)
("Not,1)
(source,9)
(then,2)
((including,3)
(Redistribution.,1)
(attribution,4)
(by,21)
(TO,,4)
(defined,1)
(OWNER,2)
(If,2)
(an,6)
(/*,1)
(Collet.,1)
(improving,1)
(grossly,1)
(COPYRIGHT,4)
(above,,1)
(theory,,1)
(mailing,1)
(7.,1)
(Notwithstanding,1)
(code,,2)
(cross-claim,1)
(provide,1)
((such,1)
(arising,1)
(Object,4)
(In,1)
(-,7)
(those,3)
(work,,2)
(easier,1)
(based,1)
(medium,,1)
(within,8)
(worldwide,,2)
(authorship.,1)
(files.,1)
(inability,1)
(you,2)
(POSSIBILITY,2)
(cannot,1)
(copies,1)
(a,21)
(statement,1)
(above,4)
(state,1)
(work,5)
(by,,3)
(to,41)
(appear.,1)
(Your,9)
(where,1)
(liability.,1)
(governing,1)
(NOT,4)
(License,,6)
(hold,1)
(and,51)
(copyright,15)
(USE,,3)
(compliance,1)
(SOFTWARE,,2)
(comment,1)
(additional,4)
(executed,1)
(mechanical,1)
(Contributor,8)
[spark@S1PA222 ~]$

© 著作权归作者所有

共有 人打赏支持
stark_summer
粉丝 61
博文 75
码字总数 51050
作品 0
朝阳
CEO
私信 提问
Spark GraphX宝刀出鞘,图文并茂研习图计算秘笈与熟练的掌握Scala语言【大数据Spark

Spark GraphX宝刀出鞘,图文并茂研习图计算秘笈 大数据的概念与应用,正随着智能手机、平板电脑的快速流行而日渐普及,大数据中图的并行化处理一直是一个非常热门的话题。图计算正在被广泛地...

Spark亚太研究院
2014/08/29
0
0
【Spark亚太研究院系列丛书】Spark实战高手之路-第3章Spark架构设计与编程模型第1节②

三、你为什么需要Spark; 你需要Spark的十大理由: 1,Spark是可以革命Hadoop的目前唯一替代者,能够做Hadoop做的一切事情,同时速度比Hadoop快了100倍以上: Logistic regression in Hadoo...

Spark亚太研究院
2014/12/16
0
1
18小时内掌握Spark,全面提升Spark技能!

伴随着大数据相关技术和产业的逐步成熟,继Hadoop之后,Spark技术以其无可比拟的优势,发展迅速,将成为替代Hadoop的下一代云计算、大数据核心技术。   Spark是基于内存,是云计算领域的继...

Spark亚太研究院
2014/06/12
62
0
Spark2.1.0之基础知识

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/beliefer/article/details/80303035 在阅读本文之前,读者最好已经阅读了《Spark2.1.0之初识Spark》一文,本文...

泰山不老生
05/24
0
0
Spark API编程动手实战-01-以本地模式进行Spark API实战map、filter和co

首先以spark的本地模式测试spark API,以local的方式运行spark-shell: 先从parallelize入手吧: map操作后结果: 下面看下 filter操作: filter执行结果: 我们用最正宗的scala函数式编程的...

stark_summer
2015/01/27
0
0

没有更多内容

加载失败,请刷新页面

加载更多

《大漠烟尘》读书笔记及读后感文章3700字

《大漠烟尘》读书笔记及读后感文章3700字: 在这个浮躁的社会里,你有多久没有好好读完一本书了? 我们总觉得自己和别人不一样,所以当看到别人身上的问题时,很少有“反求诸己”,反思自己。...

原创小博客
17分钟前
0
0
大数据教程(9.5)用MR实现sql中的jion逻辑

上一篇博客讲解了使用jar -jar的方式来运行提交MR程序,以及通过修改YarnRunner的源码来实现MR的windows开发环境提交到集群的方式。本篇博主将分享sql中常见的join操作。 一、需求 订单数据表...

em_aaron
26分钟前
1
0
十万个为什么之什么是resultful规范

起源 越来越多的人开始意识到,网站即软件,而且是一种新型的软件。这种"互联网软件"采用客户端/服务器模式,建立在分布式体系上,通过互联网通信,具有高延时(high latency)、高并发等特点...

尾生
31分钟前
1
0
Terraform配置文件(Terraform configuration)

Terraform配置文件 翻译自Terraform Configuration Terraform用文本文件来描述设备、设置变量。这些文件被称为Terraform配置文件,以.tf结尾。这一部分将讲述Terraform配置文件的加载与格式。...

buddie
55分钟前
2
0
exportfs命令, vsftp搭建ftp服务

exportfs命令 当修改/etc/exports文件后,更改的内容是不会立即生效的。如果重启nfs服务,会导致客户端重启期间的请求是挂起等待的,可以把客户端的挂载umount进行卸载后,再重启nfs服务,但...

野雪球
今天
1
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部