Transformations - Other important ones
- kumarnitinkarn
- Nov 6, 2019
- 1 min read
So far we have studied two frequently used transformations map() and flatMap(), below are some other important transformations that one should know.
filter() :
Spark RDD filter() function returns a new RDD, containing only the elements that meet a predicate.
data = sc.textFile("/axp/rim/gamrespo/warehouse/test.txt")
newData = data.map(lambda line:line.upper()).filter(lambda values : values=='THIS IS TEST')
union() :
With the union() function, we get the elements of both the RDD in new RDD. The key rule of this function is that the two RDDs should be of the same type.
data = sc.parallelize([('John', 12), ('Jack', 32), ('Sana',81), ('Nahad', 34), ('Farhad', 23)])
data1 = sc.parallelize([('Jack', 22), ('Daniel', 27)])
data2 = sc.parallelize([('Chris', 42), ('Naneil', 37)])
rddunion = data.union(data1).union(data2)
rddunion.collect()
intersection() :
With the intersection() function, we get only the common element of both the RDD in new RDD. The key rule of this function is that the two RDDs should be of the same type.
data = sc.parallelize([('Jack', 22), ('Jack', 32), ('Sana',81), ('Nahad', 34), ('Farhad', 23)])
data1 = sc.parallelize([('Jack', 22), ('Daniel', 27)])
data2 = sc.parallelize([('Jack', 22), ('Naneil', 37)])
rddintersection = data.intersection(data1).intersection(data2)
rddintersection.collect()
distinct()
It returns a new dataset that contains the distinct elements of the source dataset. It is helpful to remove duplicate data.
data = sc.parallelize([('Jack', 22), ('Jack', 22), ('Sana',81), ('Sana',81), ('Farhad', 23)])
datadistinct=data.distinct()
datadistinct.collect()
groupbykey()
When we use groupByKey() on a dataset of (K, V) pairs, the data is shuffled according to the key value K in another RDD. In this transformation, lots of unnecessary data get to transfer over the network.
data = sc.parallelize([('s', 22), ('s', 12), ('s',81), ('j',81), ('j', 23)])
group = data.groupByKey().collect()
group.foreach(print)
Keep Learning !!
Kommentare