Exploratory Data Analysis (EDA) using Apache Spark and Python

preview_player
Показать описание
#apachespark #eda #python

In this video we will walk through exploratory data analysis using Apache Spark, Databricks and Python. We will see various plots for uni-variate and multi-variate analysis and also understand which plots can help in what scenario. We will explore charts and plots like

Bar chart
Box Plot
Scatter Plot
Q-Q Plot
Plotting in maps
Pivot Table
Histograms

Finally we will discuss on how to convert spark dataframe to pandas and challenges of it also use seaborn and matplot to plot simple graph
Рекомендации по теме
Комментарии
Автор

Many have asked for the file I used for this video- You can download it from here -

Remove the last 2 line from the csv file

AIEngineeringLife
Автор

Would like to congratulate to you sir!!!! Really liked your passion of making people educate about these cutting edge new technologies and also giving the whole picture not sticking to just solve one problem...loved your work and you inspired me to always try to give others what u have as it will only come back to you...once again kuddos to your passion and humanity

agammishra
Автор

Well explained in each code and scenario. Thanks a lot

yasoram
Автор

Hats off to you sir. :)
for providing such a wonderful explanation and detail analysis. Thank you once again.

pariksheetde
Автор

Very nice and useful pointers. Thank you very much.

ijeffking
Автор

Great videos! Looking forward to more videos. Keep up the good work! :)

priyalarunnile
Автор

Thank you so much sir, really respect what you are doing to help people that want to learn and make a career in data

seemunyum
Автор

Sir you are providing us great content that too free of I sincerely want to thank you for all the hard work that you are putting in ...
Also Sir could please suggest us some personal projects that we could take up to impart this knowledge.

sachinsarathe
Автор

Very very usefull playlist, thanks for sharing indepth knowledge, I have question- how to use spark with snowflake, how to connect?

chetanmundhe
Автор

Hello Sir

Just wanted to confirm Spark is framework which works on the principal of Distributed Datasets and here we are using the pyspark library in the databricks notebook in order to perform the EDA and data cleaning. Right ?

sankarshkadambari
Автор

Hello Sir,

It will be very helpful if you can make a dedicated video on How to prepare for interview of Data Engg profile along with topic and Sub topic details...
I am a beginer and I want to move in Data engg filed, I have working experience on SQL, Python

Sorry to give trouble with my silly doubt
Thanks in advance

ankushojha
Автор

@AIEngineering - I would like to learn Spark. So, I am following your "Mastering APACHE Spark" playlist. Am I right to understand that the videos are in proper order in the playlist of 30 videos? Because as playlist progresses, I see some MLOps video as well. So just wanna seek your help in understanding the order is correct or not. Thanks for your help with this tutorials

AkshayKumar-xosk
Автор

Sir, can you rearrange the video sequence in this mastering Apache Spark Playlist as it would be good if we get every video one after the other. Thank you.

karndeepsingh
Автор

Hi Sir,
Thanks for your time. At 18:32 time, When saying about creating "Exposure" column. What is revol_bal (Revolving Balance).
Is it (rev_util)% * Loan_amnt.?
Because below statement is throwing me error.


lc_df = lc_df.withColumn("exposure", when(lc_df.bad_loan=="No", col("revol_bal")).otherwise(-10*col("revol_bal")))

display(lc_df)


Error:
cannot resolve '`revol_bal`' given input


Please correct me if I'm wrong.

dineshvarma
Автор

Can you please explain your choice of databricks community edition? Can we use it for free completely just like Colab?

royxss
Автор

Sir, can you recommend the best course to learn and apply all of these different stages of DATA SCIENCE using APACHE SPARK!! That would be a great help!!
Thank you!!

karndeepsingh
Автор

Hi Sir,

If possible please explain the usage of " -, -" from line "sns.countplot(pd_df.loc[pd_df['total_acc']<120, 'total_acc'], order=sorted(pd_df['total_acc'].unique(), saturation=1) -, - = plt.xticks(np.arange(0, 120, 10))"

I am getting a syntax error

ankushojha
Автор

Sir, have you shared this notebooks anywhere like github..
How can i access them

shaikrasool
Автор

Can you make video on how to use snowpark?

chetanmundhe
Автор

if i'm preparing for de interview should I use spark or pandas/matplot for data cleaning. which one do u suggest

christineeee