map and flatmap 区别
博客专区 > Faye_Cai 的博客 > 博客详情
map and flatmap 区别
Faye_Cai 发表于2年前
map and flatmap 区别
  • 发表于 2年前
  • 阅读 9
  • 收藏 0
  • 点赞 0
  • 评论 0

新睿云服务器60天免费使用,快来体验!>>>   

map vs flatMap in Spark

September 24, 2014Big Dataexample, spark

In the previous blogs around Spark examples, RDD.flatMap() has been used. In this blog we will look at the differences between RDD.map() and RDD.flatMap().

map and flatMap are similar, in the sense they take a line from the input RDD and apply a function on it. The way they differ is that the function in map returns only one element, while function in flatMap can return a list of elements (0 or more) as an iterator.

Also, the output of the flatMap is flattened. Although the function in flatMap returns a list of elements, the flatMap returns an RDD which has all the elements from the list in a flat way (not a list).

Sounds a bit confusing. In the below code snippet, on the input lines both map and flatMap are applied and output dumped in HDFS to wordsWithMap and wordsWithFlatMap folder.

from pyspark import SparkContext sc = SparkContext("spark://bigdata-vm:7077", "Map") lines = sc.parallelize(["hello world", "hi"]) wordsWithMap = lines.map(lambda line: line.split(" ")).coalesce(1) wordsWithFlatMap = lines.flatMap(lambda line: line.split(" ")).coalesce(1) wordsWithMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithMap") wordsWithFlatMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithFlatMap")

1

2

3

4

5

6

7

8

9

10

from pyspark import SparkContext

 

sc = SparkContext("spark://bigdata-vm:7077", "Map")

lines = sc.parallelize(["hello world", "hi"])

 

wordsWithMap = lines.map(lambda line: line.split(" ")).coalesce(1)

wordsWithFlatMap = lines.flatMap(lambda line: line.split(" ")).coalesce(1)

 

wordsWithMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithMap")

wordsWithFlatMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithFlatMap")

 

The output of the map function in HDFSmap-output
The output of the flatMap function in HDFSflatMap-output

Conclusion

The input function to map returns a single element, while the flatMap returns a list of elements (0 or more). And also, the output of the flatMap is flattened.

In the case of word count, where the input line is split into multiple words, flatMap can be used. Also, in the case of weather data set, the extractData nethod will validate the record and might or might not return a value. In this case also, flatMap can be used.

Share this:

  • 打赏
  • 点赞
  • 收藏
  • 分享
共有 人打赏支持
粉丝 0
博文 28
码字总数 5590
×
Faye_Cai
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
打赏人
留言
* 支付类型
微信扫码支付
打赏金额:
已支付成功
打赏金额: