CCA 175 Real Time Exam Scenario 18 | JOIN Multiple DataFrames, AGGREGATE and SORT data| Save as ORC

preview_player
Показать описание
Data Description
All the Order records are stored at /user/spark/dataset/retail_db/orders
All the Customer records are stored at /user/spark/dataset/retail_db/customers

Find out total number of orders placed by each customer in the year 2014
Order status should be COMPLETE
Use ORC format for the output files
Output files must not be compressed
Place the result data in HDFS directory /user/spark/dataset/result/scenario18/solution
Output should only contain customer_fname,customer_lname,orders_count
Output should be sorted by orders_count in descending order

Download the sample data from our Github repository.

🔵 COMPLETE APACHE SPARK TUTORIAL PLAYLIST 🔵

🔵 WORKING WITH STRUCTURED DATA IN APACHE SPARK 🔵

🔵 WORKING WITH DATE COLUMNS IN APACHE SPARK 🔵

🔵 WORKING WITH WINDOWING, AGGREGATE FUNCTIONS IN APACHE SPARK 🔵
Рекомендации по теме
Комментарии
Автор

video title says output as AVRO but description says ORC

anilmnt