文档章节

Spark API编程动手实战-02-以集群模式进行Spark API实战textFile、cach

stark_summer
 stark_summer
发布于 2015/01/28 13:56
字数 1832
阅读 86
收藏 0

操作HDFS:先要保证HDFS启动了:


启动spark集群:


以spark-shell运行在spark集群上:



查看下之前上传到HDFS上的”LICENSE.txt“文件:


用spark读取这个文件:


使用count统计该文件的行数:


 我们可以看到count 耗时为0.239708s

对该RDD进行cache操作并执行count使得缓存生效:


执行count结果为:


此时耗时为0.21132s

再执行count操作:


此时耗时为0.029580s,这时因为我们自己基于cache后的数据进行操作的。

接着我们对上面的rdd进行wordcount操作:



通过saveAsTextFile把数据存到HDFS中:


我们通过web控制台查看下运行结果:


我们通过命令行看下part-00000和part-00001内容:

[spark@S1PA222 ~]$ hadoop fs -cat /data/resultLicenseWordCount/part-00000
15/01/22 13:51:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(under,10)
(Unless,3)
(Contributions),1)
(offer,1)
(agree,1)
(BUSINESS,2)
(NON-INFRINGEMENT,,1)
(its,4)
(materials,2)
(event,1)
(intentionally,2)
(Grant,2)
(writing,1)
(include,3)
(responsibility,,1)
(have,2)
(MERCHANTABILITY,,1)
(Contribution,3)
(Massachusetts,1)
(express,2)
("Your"),1)
((i),1)
(However,,1)
(been,2)
(files;,1)
(This,1)
(stating,1)
(2-Clause,1)
(conditions.,1)
(non-exclusive,,2)
(appropriateness,1)
(marked,1)
(risks,1)
(any,28)
(IS",4)
(implementation,1)
(filed.,1)
(Sections,1)
(fee,1)
(losses),,1)
(out,1)
(contract,2)
(DISTRIBUTION,1)
(4.,1)
(file,6)
(documentation,,2)
(wherever,1)
(unless,1)
(below).,1)
(names,,1)
(verbal,,1)
(ANY,10)
(version,1)
(file.,2)
(are,10)
(no-charge,,2)
(2.,1)
(from,,1)
(reproduction,,3)
(2011-2014,,1)
(assume,1)
(licenses,1)
(DATA,,2)
(IS,2)
(recommend,1)
(prominent,1)
(revisions,,1)
("[]",1)
(FITNESS,3)
(otherwise,,3)
(distribution,,1)
(necessarily,1)
(Apache,5)
(grant,1)
(CONTRIBUTORS,4)
(as,15)
(irrevocable,2)
(inclusion,2)
(purpose,2)
(products,1)
(ARE,2)
(merely,1)
(File,1)
(Definitions.,1)
(form,10)
(IMPLIED,4)
(Warranty,1)
(Patent,1)
(incurred,1)
(8.,1)
(repository,1)
(contributors,1)
("printed,1)
(sell,,2)
(:,3)
(malfunction,,1)
(Version,2)
(origin,1)
(alongside,1)
(CRC,1)
(implied.,1)
(contract,,1)
(representatives,,1)
(warranty,1)
(offer,,1)
(org.apache.hadoop.util.bloom.*,1)
(KIND,,2)
(is,10)
(conspicuously,1)
(found,1)
(charge,1)
(make,,1)
(file,,1)
(associated,1)
(even,1)
(same,1)
((Don't,1)
(outstanding,1)
(link,1)
([name,1)
(Trademarks.,1)
(notice,2)
(endorse,1)
(shall,15)
(contact,1)
(Redistributions,4)
(using,1)
(class,1)
(name),1)
(behalf,5)
(form.,1)
(We,1)
(INTERRUPTION),2)
(responsible,1)
(annotations,,1)
(THIS,4)
(subject,1)
(acting,1)
(permitted,2)
(OUT,2)
(BASIS,,2)
(has,2)
(Accepting,1)
(defend,,1)
(University,1)
([yyyy],1)
((http://www.one-lab.org),1)
(EVENT,2)
(granting,1)
(portions,1)
(implied,,1)
(NOTICE,5)
(infringed,1)
(limitation,,1)
(names,2)
(electronic,,1)
(PURPOSE,2)
(licensable,1)
(section),1)
(conditions,14)
(EVEN,2)
(acts),1)
(law,3)
(licenses.,1)
(compression,1)
(readable,1)
(solely,1)
(configuration,1)
(information.,1)
(litigation,2)
(represent,,1)
(warranty,,1)
(shares,,1)
(supersede,1)
(governed,1)
(marks,,1)
(http://code.google.com/p/lz4/,1)
(modification,,2)
(fifty,1)
(sent,1)
(places:,1)
(means,2)
(identifying,1)
(this,22)
(Works",1)
(Louvain,1)
(prior,1)
(slicing-by-8,1)
(PROCUREMENT,2)
(changed,1)
(describing,1)
(only,4)
(contributory,1)
(normally,1)
(indirect,,2)
(WITHOUT,2)
(Works,12)
(documentation,3)
(agreement,1)
(otherwise,3)
("AS,4)
(damages,,1)
(patent,,1)
(APACHE,1)
(without,6)
("NOTICE",1)
(Limitation,1)
(SUBSTITUTE,2)
(Contribution(s),3)
(Subject,2)
(Submission,1)
(UCL,1)
(TITLE,,1)
(trademarks,,1)
((iii),1)
(2.0,1)
(Fast,1)
(exercise,1)
(accepting,2)
(example,1)
(distribution.,2)
(interfaces,1)
(conditions:,1)
(act,1)
(incorporated,2)
(provides,2)
(limited,4)
(LZ4,3)
(2008,2009,2010,1)
(can,2)
(contents,1)
(PURPOSE.,1)
(recipients,1)
("Contribution",1)
(failure,1)
(communication,3)
(commercial,1)
(works,1)
(language,1)
(permissions,3)
(WARRANTIES,4)
(media,1)
(reserved.,2)
(Works,,2)
(How,1)
(WARRANTIES,,2)
(controlled,1)
(Warranty.,1)
(2.0,,1)
((http://www.opensource.org/licenses/bsd-license.php),1)
(own,4)
(submit,1)
(SHALL,2)
(reasonable,1)
(reason,1)
(agreed,3)
(systems,1)
(patent,5)
(form,,4)
(Technology.,1)
(advised,1)
(systems,,1)
(classes:,1)
(HOWEVER,2)
(distribution,3)
(DAMAGES,2)
((c),2)
(src/main/native/src/org/apache/hadoop/util:,1)
(PROFITS;,2)
(perpetual,,2)
(applies,1)
(apply,2)
(subcomponents,2)
(modify,2)
(owner],1)
(one,1)
(modifying,1)
(counterclaim,1)
(January,1)
(discussing,1)
(CONTRACT,,2)
(with,16)
((C),1)
(infringement,,1)
(2004,1)
(lawsuit),1)
(specific,2)
(LZ,1)
(warranties,1)
(reproducing,1)
(promote,1)
(beneficial,1)
(ADVISED,2)
((a),1)
(other,9)
(date,1)
(met:,2)
(publicly,2)
(from,4)
(LIMITED,4)
(display,,1)
(MERCHANTABILITY,2)
(damages,3)
(SUBCOMPONENTS:,1)
(negligence),,1)
(remain,1)
(CONDITIONS,4)
(their,2)
(electronic,1)
(identification,1)
(determining,1)
(consistent,1)
(display,1)
(writing,,3)
(trade,1)
(third-party,2)
(,1299)
(description,1)
(REPRODUCTION,,1)
(attached,1)
(list,4)
(*,34)
(INDIRECT,,2)
(designated,1)
(Contribution.",1)
(complies,1)
(addendum,1)
(damages.,1)
(Yann,1)
(EXPRESS,2)
(License;,1)
(6.,1)
(GOODS,2)
(subsequently,1)
(included,2)
(replaced,1)
(notice,,5)
[spark@S1PA222 ~]$   hadoop fs -cat /data/resultLicenseWordCount/part-00001

15/01/22 13:52:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(For,6)
(reproduce,,1)
("Contributor",1)
((or,3)
(nothing,1)
(work.,1)
(content,1)
(HOLDERS,2)
(add,2)
(through,1)
(All,2)
(perform,,1)
(result,1)
(goodwill,,1)
(herein,1)
(direct,,1)
(used,1)
(To,1)
(harmless,1)
(9.,1)
(these,1)
(control,,1)
(INCIDENTAL,,2)
(indicated,1)
(part,4)
(alone,1)
(different,1)
(forms,,2)
(purposes,4)
(https://groups.google.com/forum/#!forum/lz4c,1)
(be,7)
(/**,2)
(carry,1)
(separable,1)
(including,5)
(contained,1)
(combination,1)
(calculation,1)
(license,7)
(FOR,6)
(thereof,,2)
(ARISING,2)
(constitutes,1)
(but,5)
(types.,1)
(stated,2)
(archives.,1)
(obligations,,1)
(5.,1)
(Works;,3)
(nor,1)
("Legal,1)
(Work,20)
(whole,,2)
(Copyright,5)
(at,3)
(copyright,,1)
(Redistribution,2)
(object,1)
(copy,3)
(indemnify,,1)
(asserted,1)
(HADOOP,1)
(attach,1)
("control",1)
(support,,1)
("Object",1)
(give,1)
(THEORY,2)
(may,10)
(except,2)
("Work",1)
(sublicense,,1)
(IF,2)
(granted,2)
(project,2)
(authorized,2)
(SPECIAL,,2)
(BY,2)
(retain,2)
(or,65)
(transfer,1)
(fields,1)
(Licensor,,1)
((b),1)
((ii),1)
(2005,,1)
(of,75)
(does,1)
(transformation,1)
((INCLUDING,2)
(DIRECT,,2)
(management,1)
(modified,1)
(Licensed,1)
(percent,1)
(Header,1)
(original,2)
(Contributor,,1)
(native,1)
((INCLUDING,,2)
(PARTICULAR,3)
(limitations,1)
(THE,10)
(INCLUDING,,2)
(power,,1)
(CAUSED,2)
(de,1)
(appropriate,1)
(against,,1)
(TORT,2)
("Source",1)
(each,4)
(1.,1)
(following,10)
(Liability.,2)
(acceptance,1)
("You",1)
(sole,1)
(from),1)
(See,1)
(tracking,1)
(for,19)
(cause,2)
(alleging,1)
(obtain,1)
(reproduce,3)
(source,,1)
(control,2)
(EXEMPLARY,,2)
(TERMS,2)
(terms,8)
(syntax,1)
(SERVICES;,2)
(made,,1)
(BUT,4)
(compiled,1)
(issue,1)
("submitted",1)
(OneLab,1)
(algorithm,1)
(was,1)
(While,1)
(entity,,1)
(do,3)
(PROVIDED,2)
(no,2)
(License,10)
(entity,3)
(Contributions.,2)
(mean,10)
(individual,3)
(Institute,1)
(computer,1)
(notices,9)
(Neither,1)
(Licensor,8)
(STRICT,2)
(made,1)
(authorship,,2)
(bind,1)
((the,1)
(indemnity,,1)
(distribute,3)
(You,24)
(grants,2)
(brackets,1)
(meet,1)
(for,,1)
(service,1)
(in,31)
(trademark,,1)
(boilerplate,1)
(WAY,2)
(LOSS,2)
(distributed,3)
(LIABILITY,,4)
(submitted,2)
(public,1)
(OF,19)
(managed,1)
(derived,2)
(Source,8)
(use,,4)
(name,2)
(definition,,2)
(that,25)
(src/main/native/src/org/apache/hadoop/io/compress/lz4/{lz4.h,lz4.c,lz4hc.h,lz4hc.c},,1)
(customary,1)
(BSD,1)
(thereof,1)
(claims,2)
(CONSEQUENTIAL,2)
(translation,1)
(format.,1)
(construed,1)
(DAMAGE.,2)
(applicable,3)
(binary,4)
(regarding,1)
(European,1)
(excluding,3)
(END,1)
((d),1)
(choose,1)
(NO,2)
(BE,2)
(direct,2)
(retain,,1)
(modifications,,3)
(forum,1)
(owner,4)
(USE,2)
(informational,1)
(The,3)
(legal,1)
((50%),1)
(document.,1)
(received,1)
(such,17)
(institute,1)
(distribute,,2)
(WHETHER,2)
(page",1)
((except,1)
(loss,1)
(common,1)
(additions,1)
(BSD-style,1)
(Appendix,1)
(Use,1)
(disclaimer,2)
(resulting,1)
(ON,2)
(hereby,2)
(License.,11)
(software,3)
(whom,1)
(along,1)
(lists,,1)
(required,4)
(OR,18)
(ownership,2)
(SOFTWARE,2)
(the,122)
(includes,1)
(obligations,1)
(import,,1)
(not,11)
(either,2)
(terminate,1)
(if,4)
(stoppage,,1)
(provided,9)
(submitted.,1)
(all,3)
(permission.,1)
("License");,1)
(written,2)
(generated,2)
(consequential,1)
(Derivative,17)
(AND,11)
(rights,3)
(http://www.apache.org/licenses/,1)
(terms.,1)
(Catholique,1)
(deliberate,1)
(entity.,2)
(Work,,4)
(special,,1)
(Additional,1)
(Legal,3)
(034819,1)
(least,1)
(text,4)
(on,11)
(editorial,1)
(redistributing,2)
("License",1)
(against,1)
(permission,1)
(9,1)
(separate,2)
(and/or,3)
(LICENSE,1)
(union,1)
((and,1)
(1,1)
(including,,1)
(Entity,3)
(negligent,1)
(LIABLE,2)
(IN,6)
(use,8)
(enclosed,2)
(contains,1)
(files,1)
(Entity",1)
(Work.,1)
(owner.,1)
(preferred,1)
(modifications,3)
(brackets!),1)
(available,1)
(code,5)
(http://www.apache.org/licenses/LICENSE-2.0,1)
(more,1)
(possibility,1)
(product,1)
(liable,1)
(SUCH,2)
(direction,1)
(must,8)
(making,1)
(Disclaimer,1)
(disclaimer.,2)
(Commission,1)
(OTHERWISE),2)
(Hadoop,1)
((an,1)
(APPENDIX:,1)
("Licensor",1)
(DISCLAIMED.,2)
("Derivative,1)
(elaborations,,1)
(incidental,,1)
(prepare,1)
(A,3)
(exercising,1)
(*/,3)
(which,2)
(pertain,2)
(explicitly,1)
(tort,1)
(3.,1)
(also,1)
(conversions,1)
(liability,2)
(whether,4)
(character,1)
(should,1)
(thereof.,1)
(of,,3)
(your,4)
(royalty-free,,2)
(entities,1)
(or,,1)
(NEGLIGENCE,2)
(author,1)
("Not,1)
(source,9)
(then,2)
((including,3)
(Redistribution.,1)
(attribution,4)
(by,21)
(TO,,4)
(defined,1)
(OWNER,2)
(If,2)
(an,6)
(/*,1)
(Collet.,1)
(improving,1)
(grossly,1)
(COPYRIGHT,4)
(above,,1)
(theory,,1)
(mailing,1)
(7.,1)
(Notwithstanding,1)
(code,,2)
(cross-claim,1)
(provide,1)
((such,1)
(arising,1)
(Object,4)
(In,1)
(-,7)
(those,3)
(work,,2)
(easier,1)
(based,1)
(medium,,1)
(within,8)
(worldwide,,2)
(authorship.,1)
(files.,1)
(inability,1)
(you,2)
(POSSIBILITY,2)
(cannot,1)
(copies,1)
(a,21)
(statement,1)
(above,4)
(state,1)
(work,5)
(by,,3)
(to,41)
(appear.,1)
(Your,9)
(where,1)
(liability.,1)
(governing,1)
(NOT,4)
(License,,6)
(hold,1)
(and,51)
(copyright,15)
(USE,,3)
(compliance,1)
(SOFTWARE,,2)
(comment,1)
(additional,4)
(executed,1)
(mechanical,1)
(Contributor,8)
[spark@S1PA222 ~]$

© 著作权归作者所有

共有 人打赏支持
stark_summer
粉丝 59
博文 75
码字总数 51050
作品 0
朝阳
CEO
Spark GraphX宝刀出鞘,图文并茂研习图计算秘笈与熟练的掌握Scala语言【大数据Spark

Spark GraphX宝刀出鞘,图文并茂研习图计算秘笈 大数据的概念与应用,正随着智能手机、平板电脑的快速流行而日渐普及,大数据中图的并行化处理一直是一个非常热门的话题。图计算正在被广泛地...

Spark亚太研究院
2014/08/29
0
0
【Spark亚太研究院系列丛书】Spark实战高手之路-第3章Spark架构设计与编程模型第1节②

三、你为什么需要Spark; 你需要Spark的十大理由: 1,Spark是可以革命Hadoop的目前唯一替代者,能够做Hadoop做的一切事情,同时速度比Hadoop快了100倍以上: Logistic regression in Hadoo...

Spark亚太研究院
2014/12/16
0
1
Spark2.1.0之基础知识

在阅读本文之前,读者最好已经阅读了《Spark2.1.0之初识Spark》一文,本文将对Spark的基础知识进行介绍。但在此之前,读者先跟随本人来一次简单的时光穿梭,最后还将对Java与Scala在语言上进...

beliefer
05/24
0
0
Spark API编程动手实战-01-以本地模式进行Spark API实战map、filter和co

首先以spark的本地模式测试spark API,以local的方式运行spark-shell: 先从parallelize入手吧: map操作后结果: 下面看下 filter操作: filter执行结果: 我们用最正宗的scala函数式编程的...

stark_summer
2015/01/27
0
0
Spark 入门(Python、Scala 版)

本文中,我们将首先讨论如何在本地机器上利用Spark进行简单分析。然后,将在入门级水平探索Spark,了解Spark是什么以及它如何工作(希望可以激发更多探索)。最后两节将开始通过命令行与Spa...

大数据之路
2015/05/07
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

windbg调试C源码级驱动

联机方式不多说了。我博客里有,英文的。 windbg联机文档 https://docs.microsoft.com/zh-cn/windows-hardware/drivers/debugger/debug-universal-drivers---step-by-step-lab--echo-kernel......

simpower
37分钟前
0
0
redis快照和AOF简介

数据持久化到硬盘:一是快照(snapshotting),二是只追加文件(append-only file AOF) 快照 核心原理:redis某个时间内存内的所有数据写入硬盘 场景:redis快照内存里面的数据 1. 用户发送bgsav...

拐美人
37分钟前
0
0
这个七夕,送你一份程序员教科书级别的告白指南

给广大爱码士们的高能预警: 今天,就是七夕了…… (单身非作战人群请速速退场!) 时常有技术GG向个推君抱怨 经过网民多年的教育 以及技术人持之以恒的自黑 冲锋衣狂热分子·格子衫骨灰级粉...

个推
42分钟前
0
0
python爬虫日志(15)cookie详解

转载:原文地址 早期Web开发面临的最大问题之一是如何管理状态。服务器端没有办法知道两个请求是否来自于同一个浏览器。那时的办法是在请求的页面中插入一个token,并且在下一次请求中将这个...

茫羽行
43分钟前
0
0
qlv视频格式转换器

  腾讯视频中的视频影视资源有很多,小编经常在里面下载视频观看,应该也有很多朋友和小编一样吧,最近热播的电视剧也不少,如《香蜜沉沉烬如霜》、《夜天子》还有已经完结的《扶摇》,这么...

萤火的萤火
46分钟前
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部