56 - Spark RDD - Exercise 1 - Unique Word Count

preview_player
Показать описание
@backstreetbrogrammer

--------------------------------------------------------------------------------
Chapter 11 - Exercise 1 - Unique Word Count
--------------------------------------------------------------------------------
Task: Count unique English words from the given file - numbers, punctuations, space, tabs, etc. should NOT be counted.
Also, the words should be case-insensitive, i.e. "Java" and "java" should be counted same.

Example output (word, count) :

(someone,5)
(therefor,2)
(greater,5)
(ratification,2)
(full,14)
(secure,4)
(bailiffs,14)
(old,7)
(order,7)
(carried,2)

Meaning that word "someone" appeared total 5 times in the given file.

Bonus Task: Find the top 10 words with maximum counts

Top 10 words with max count:
(the,945)
(of,772)
(and,593)
(to,406)
(shall,326)
(or,288)
(in,281)
(be,270)
(our,254)
(we,211)

#java #javadevelopers #javaprogramming #apachespark #spark
Рекомендации по теме
Комментарии
Автор

Hi Sir, How many videos are left for this course ??

bhagesharora