CCA 175 Real Time Exam Scenario 12 | Read PARQUET Data | Save as JSON with Snappy Compression

preview_player
Показать описание
Data Description
All the order records are stored at
/user/spark/dataset/retail_db/orders_parquet
Data is in parquet format

Output Requirement
✔️ Output all the PENDING orders in July 2013
✔️ Use JSON format for the output files
✔️ Place the result data in HDFS directory /user/spark/dataset/result/scenario11/solution
✔️ Result should only contain records that have order_status value as "PENDING"
✔️ order_date should be in format yyyy-MM-dd
✔️ Compress the output using snappy compression and output should only contain order_date, order_status

Download the sample data from our Github repository.

🔵 COMPLETE APACHE SPARK TUTORIAL PLAYLIST 🔵

🔵 WORKING WITH STRUCTURED DATA IN APACHE SPARK 🔵

🔵 WORKING WITH DATE COLUMNS IN APACHE SPARK 🔵

🔵 WORKING WITH WINDOWING, AGGREGATE FUNCTIONS IN APACHE SPARK 🔵
Рекомендации по теме
Комментарии
Автор

Error while writing dataframe using snappy

abhigyapranshu