BigData|Parquet File processing with sparkSQL by Suresh

preview_player
Показать описание
DURGASOFT is INDIA's No.1 Software Training Center offers
online training on various technologies like JAVA, .NET ,
ANDROID,HADOOP,TESTING TOOLS , ADF, INFORMATICA,TABLEAU,IPHONE,OBIEE,ANJULAR JS, SAP...
courses from Hyderabad & Bangalore -India with Real Time Experts.
so that our Supporting Team will arrange Demo Sessions.
Ph:Call +91-8885252627,+91-7207212428,+91-7207212427,+91-8096969696.
Рекомендации по теме
Комментарии
Автор

Great!
I am beginner on Spark and i would like to compute some joins with 4 CSV files so that at the end i would have one dataframe. File1 is about 8GB, File2 5MB, File3 5MB, File4 15MB, File5 150MB.

- I loaded the files, saved them as Parquet, read them again and finaly i created views (createOrReplaceTempView) of these dataframes.

- I joined dataframe_File1 with dataframe_File2 as merge1.

- Now i would like to save merge1 as parquet again, read it an do others merges and so on...


probleme:
- lunch the spark Job and the job hangs.


Any Suggestions?
PS: I am using a standalone cluster (1 master = 1 slave).
I did some join optimization concepts and configurations(shuflle.partitions, etc..)

Best regards.

djibb.
visit shbcf.ru