CCA 175 Real Time Exam Scenario 19 | Read CSV | AGGREGATE | RANK | Save as TEXT Pipe Delimited

preview_player
Показать описание
Data Description
All the product records are stored at /user/spark/dataset/retail_db/products
All the category records are stored at /user/spark/dataset/retail_db/categories
All the Order Items records are stored at /user/spark/dataset/retail_db/order_items
Data is in text format

Output Requirement
Get top five best selling product in "Accessories" category
Place the result data in HDFS directory /user/spark/dataset/result/scenario19/solution
Save the output as text file
Use "|" as field separator
Output data should contain columns category_name,product_name,product_revenue

Download the sample data from our Github repository.

🔵 COMPLETE APACHE SPARK TUTORIAL PLAYLIST 🔵

🔵 WORKING WITH STRUCTURED DATA IN APACHE SPARK 🔵

🔵 WORKING WITH DATE COLUMNS IN APACHE SPARK 🔵

🔵 WORKING WITH WINDOWING, AGGREGATE FUNCTIONS IN APACHE SPARK 🔵
Рекомендации по теме
Комментарии
Автор

Best resource to clear the CCA 175 exam in the first attempt. Thanks Proedu.

dasariprasad
Автор

It's definitely boost confidence, it's almost close to the real exam, this is enough to clear the exam

anilmnt
Автор

Hi,

is it necessary to use the rank() function?

I get the same result using Group by, Order by and limit 5.

select c.category_name, p.product_name, sum(oi.order_item_subtotal) as product_revenue
from p join oi on p.product_id = oi.order_item_product_id join c on p.product_category_id = c.category_id
where c.category_name = 'Accessories'
group by c.category_name, p.product_id, p.product_name
order by product_revenue desc limit 5

rajantawade
Автор

Did any one here write test recently? Will any one monitor exam while we are taking test?do we need to screen share?

rajareddy