CCA 175 Real Time Exam Scenario 2 | Read Parquet File | Write as JSON in HDFS with GZIP Compression

Показать описание

Data Description
1. All the order records are stored in the HDFS directory
/user/spark/dataset/retail_db/orders_parquet
2. Data is in Parquet format

Output Requirement
1. Output all the completed orders that is records where order_status value is "COMPLETE"
2. Use JSON format for the output files
3. Place the result data in HDFS directory /user/spark/dataset/result/scenario2/solution
4. order_date should be in format yyyy-MM-dd
5. Compress the output using gzip compression
6. Output should only contain order_id, order_date,status

Download the sample data from our Github repository.

🔵 COMPLETE APACHE SPARK TUTORIAL PLAYLIST 🔵

🔵 WORKING WITH STRUCTURED DATA IN APACHE SPARK 🔵

🔵 WORKING WITH DATE COLUMNS IN APACHE SPARK 🔵

🔵 WORKING WITH WINDOWING, AGGREGATE FUNCTIONS IN APACHE SPARK 🔵

Рекомендации по теме

Комментарии

command to view the gzip data from hdfs location?

mattanishanth

CCA 175 Real Time Exam Scenario 2 | Read Parquet File | Write as JSON in HDFS with GZIP Compression

CCA 175 Real Time Exam Scenario 12 | Read PARQUET Data | Save as JSON with Snappy Compression

CCA 175 Real Time Exam Scenario 5 | Read AVRO data | Write PARQUET in HDFS with SNAPPY Compression

CCA 175 Real Time Exam Scenario 7 | Read CSV File | Write in HIVE Table

CCA 175 Real Time Exam Scenario 13 | Read Hive Table | Write as PARQUET with SNAPPY Compression

CCA 175 Real Time Exam Scenario 11 | Read AVRO Data | Write as Tab Separated Value bzip2 compression

CCA 175 Real Time Exam Scenario 10 | Read CSV File | Write in HIVE Table

CCA 175 Real Time Exam Scenario 17 | JOIN Multiple DataFrames | Save as JSON and DEFLATE Compression

CCA 175 Real Time Exam Scenario 1 | Read Tab Delimited File | Write as CSV in HDFS

CCA 175 Real Time Exam Scenario 15 | Read CSV Data | JOIN Multiple DataFrames | Save as CSV

CCA 175 Real Time Exam Scenario 18 | JOIN Multiple DataFrames, AGGREGATE and SORT data| Save as ORC

CCA 175 Real Time Exam Scenario 6 | Read Hive table | Write as PARQUET in HDFS with GZip Compression

CCA 175 Real Time Exam Scenario 3 | Read Tab Delimited File | Write as ORC with SNAPPY Compression

CCA 175 Real Time Exam Scenario 16 | Read CSV | Save as PARQUET with SNAPPY Compression

CCA 175 Real Time Exam Scenario 20 | JOIN Multiple DataFrames | Save as PARQUET | SNAPPY Compression

CCA 175 Real Time Exam Scenario 9 | Read AVRO Data | Write as JSON in HDFS

CCA 175 Real Time Exam Scenario 14 | Read Tab Separated Values | Save PARQUET with GZIP compression

CCA 175 Real Time Exam Scenario 8 | Read CSV File | Write in HIVE Table with PARQUET File Format

CCA 175 Video

CCA 175 Real Time Exam Scenario 2 | Read Parquet File | Write as JSON in HDFS with GZIP Compression

CCA 175 Real Time Exam Scenario 4 | Read CSV file | Write as TSV in HDFS with LZ4 Compression

CCA175 Practice Test | Open Test Questions & SparkShell

CCA 175 Real Time Exam Scenario 19 | Read CSV | AGGREGATE | RANK | Save as TEXT Pipe Delimited

CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam | Intellipaat

CCA 175 Certification Preparation Strategy