CCA 175 Real Time Exam Scenario 2 | Read Parquet File | Write as JSON in HDFS with GZIP Compression

preview_player
Показать описание
Data Description
1. All the order records are stored in the HDFS directory
/user/spark/dataset/retail_db/orders_parquet
2. Data is in Parquet format

Output Requirement
1. Output all the completed orders that is records where order_status value is "COMPLETE"
2. Use JSON format for the output files
3. Place the result data in HDFS directory /user/spark/dataset/result/scenario2/solution
4. order_date should be in format yyyy-MM-dd
5. Compress the output using gzip compression
6. Output should only contain order_id, order_date,status

Download the sample data from our Github repository.

🔵 COMPLETE APACHE SPARK TUTORIAL PLAYLIST 🔵

🔵 WORKING WITH STRUCTURED DATA IN APACHE SPARK 🔵

🔵 WORKING WITH DATE COLUMNS IN APACHE SPARK 🔵

🔵 WORKING WITH WINDOWING, AGGREGATE FUNCTIONS IN APACHE SPARK 🔵
Рекомендации по теме
Комментарии
Автор

command to view the gzip data from hdfs location?

mattanishanth