RDD Transformations - map() v/s flatMap()

kumarnitinkarn
Oct 11, 2019
1 min read

Updated: Nov 6, 2019

We have already studied about the transformations and its basics. Here in this section we will study two very important and frequently used transformations i.e. Map and FlatMap.

Map Transformation :

Map() operation applies to each element of RDD and it returns the result as new RDD. Spark Map function takes one element as input process it according to custom code (specified by the developer) and returns one element at a time. Map transforms an RDD of length N into another RDD of length N. The input and output RDDs will typically have the same number of records.

Examples :

(1) Suppose you load a text file as RDD in spark environment :

data = sc.textFile("INPUT-PATH")

Now, you can apply a map transformation that will take each record(typically a line here in input file) as its input and outputs each line/record after applying upper case function.

newData = data.map(lambda line : line.upper())

newData.collect()

FlatMap Transformation :

It is similar to Map, but FlatMap allows returning 0, 1 or more elements from map function.flatMap() transforms an RDD of length N into another RDD of length M.

Example :

Suppose you have a file with two lines :

THIS IS TEST

HELLO TEST

Now if you apply flatMap() :

result = data.flatMap (lambda line : line.split(" ") )

result.collect()

Output will be :

['THIS', 'IS', 'TEST', 'HELLO', 'TEST']

So this means for two input records we get five records as output.

Keep learning keep growing !!

RDD Transformations - map() v/s flatMap()

Recent Posts

留言