Transformations on one pair RDD

preview_player
Показать описание

Pair RDD transformations are of 2 types. Some transformations can be applied to a single pair RDD, whereas some other types of transformations can be applied to multiple pair RDDs. In this video, let’s look at those transformations that can be applied to a single pair RDD.

1) reduceByKey,
2) groupByKey,
3) sortByKey,
4) mapValues,
5) flatMapValues,
6) keys(),
7) values() are some of the basic pairRDD transformations that can be applied to single pair RDD.

1) reduceByKey is a pair RDD transformation that combines values with the same key. How it combines the values is based on the function that we pass to the reduceByKey transformation.

Say, If inputRDD contains elements {(1,2)(2,4)(2,5)}, then reduceByKey transformation to sum the values will return {(1,2),(2,9)} as the resultant RDD.

Here the function to sum that values is passed to reduceByKey transformation. reduceByKey transformation will take care of summing the corresponding values for each key.


Here the values corresponding to each key are summed up

2) groupByKey is a pair RDD transformation that group values with the same key. This transformation doesn't take any function.

for e.g., If inputRDD contains elements {(1,2)(2,4)(2,5)}, then groupByKey transformation will return highlighted one as the resultant RDD.

Here the values '4' & '5' are grouped together, since they both belong to the same key '2'.

3) sortByKey is a pair RDD transformation, that sorts the rDD based on the key. This transformation doesn't take any function.

for e.g., If inputRDD contains elements {(5,2)(2,4)(2,5)}, then sortByKey transformation will return {(2,4),(2,5),(5,2)} as the resultant RDD. Here the elements are sorted based on the key.

4) mapValues is a pair RDD transformation, that applies a user function to each VALUE of a pair RDD without altering the key.

Say, If inputRDD contains elements {(5,2)(2,4)(2,5)}, then passing the increment function to mapValues transformation will return ((5,3),(2,5),(2,6)) as the resultant RDD.

Here the values of each key are incremented by 1. The key is unaltered.

5) flatmapValues is a pair RDD transformation, that applies a user function to each value of a pair RDD. The function returns an iterator of values for each key.

The flatmapValues transformation returns key and corresponding iterator of values.

For instance, If inputRDD contains elements {(5,2)(2,4)(2,5)}, then passing the highlighted function to flatmapValues transformation will return the highlighted as the resultant RDD. Here key is unaltered. The value is an iterator of values until 5.

6) keys() is a pair RDD transformation , that returns only the keys of the RDD.

for e.g., If inputRDD contains elements {(5,2)(2,4)(2,5)}, then applying keys() transformation will return '5' & '2' which are the keys in the RDD.

7) values() is a pair RDD transformation, that returns only the values of the RDD.

say, If inputRDD contains elements {(5,2)(2,4)(2,5)}, then applying values() transformation will return '2' , '4' , & '5' which are the values in the RDD.

In the next video, let’s look at combineByKey Transformation, which is also a transformation that is applied on a single pair RDD.
Рекомендации по теме