CCA 175 Real Time Exam Scenario 1 | Read Tab Delimited File | Write as CSV in HDFS

preview_player
Показать описание
Data Description
1. All the customer records are stored in the HDFS directory
/user/spark/dataset/retail_db/customers-tab-delimited
2. Data is in Text format
3. Data is Tab delimited

Output Requirement
1. Output all the customers who live in California
2. Use text format for the output files
3. Place the result data in /user/spark/dataset/result/scenario1/solution
4. Result should only contain records that have state value as "CA"
5. Output should only contain customer's full name
Example: Robert Hudson

Download the sample data from our Github repository.

🔵 COMPLETE APACHE SPARK TUTORIAL PLAYLIST 🔵

🔵 WORKING WITH STRUCTURED DATA IN APACHE SPARK 🔵

🔵 WORKING WITH DATE COLUMNS IN APACHE SPARK 🔵

🔵 WORKING WITH WINDOWING, AGGREGATE FUNCTIONS IN APACHE SPARK 🔵
Рекомендации по теме
Комментарии
Автор

Hi Proedu, Is providing the alias name necessary for this question?

arindamnath
Автор

Thanks. The questions for the real exam are similar or more difficult ?

lfriboulet