Tutorial 3- Pyspark With Python-Pyspark DataFrames- Handling Missing Values

preview_player
Показать описание
Apache Spark is written in Scala programming language. To support Python with Spark, Apache Spark community released a tool, PySpark. Using PySpark, you can work with RDDs in Python programming language also. It is because of a library called Py4j that they are able to achieve this.
---------------------------------------------------------------------------------------------------------------------------
Subscribe my vlogging channel
Please donate if you want to support the channel through GPay UPID,

Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more

Connect with me here:
Рекомендации по теме
Комментарии
Автор

walking into the weekend knowing PySpark, what a feeling

mbmathematicsacademic
Автор

Watching this in April 2024!
Thanks for making this unbeatable content Krish!! Also love your ongoing langchain series!

utkarshkapil
Автор

The na.fill('Missing values') was not working for the integer columns like Age/Experience/Salary
Only if we read the dataset as 'inferSchema=False' and all the columns are taken in string format by default we can fill the missing values with a string like 'Missing goes for the string columns if we fill the missing values with a '0'
Just an input....:)

tanishasharma
Автор

Thanks for this Playlist, please include deep learning with Spark and Koalas library too in this Playlist. 🙏

prernakhurana
Автор

Thank you sir.. expecting more for pyspark playlist🙏

bhargavikoti
Автор

@11:02 the Missing Values could appear in Experience, Age and Salary as the data type of these columns was string. If it was Integer, it would have not appeared. Hope it helps someone!

khushboojain
Автор

thanks Krish,

Note - I was getting bad result using - result was only filling first column and not other columns, using df_pyspark.fillna( { 'Age':0, 'Experience':0 } ).show(), it was working correctly. I am not sure why its happening in my system only.

tableauvizwithvineet
Автор

Hey krish can you pls make a video about communication skills for data scientists and how i develop it .

faisalmomin
Автор

Can we get the mean or median separatly and apply over the particular column

ajaysinha
Автор

Can you play only python for data analyst roles. Series

sociallyviral
Автор

Hello, Krish
Can you Please make a video on Music genre classification using AI electronic projects.?
It would be a great help.

paulasam
Автор

Hi Krish.
Can you please let us know when will you make video on Bert ?

syedalinaqi
Автор

df_spark= df_spark.na.fill('missing value', ['age', 'salary']) I have tried to fill the null values but not working.

paritoshprasad
Автор

Hii Krish, When I ran imputer part it is giving me an error saying "IllegalArgumentException: requirement failed: Column names must be of type numeric but was actually of type string."
How to resolve this??

sankarazad
Автор

Krish .. could you please explain spark architecture?

gururajraykar
Автор

How to impute categorical feature. "mode" is not working with categorical feature.

ramthiagu
Автор

Hi Krish, Thanks for the videos..
One query How come it is filling integer columns with 'Missing Value' when you set inferSchema = True.
Also I found the reason why? when I saw your DataFrame (11:54) column data types all are strings, so can you help why it did not read inferSchema

raviluminary
Автор

Hey..in this section when u r writing inferschema=true then also it is taking all as string, hence when u r doing full with missing keyword as a value it is getting applied in all rows.
But when I am doing it with inferschema=true then fill is only happening fr string rows not all...please clarify this

anirbaansaha
Автор

can you also make an video on spark Architecture which will be really helpful for us.

vigneshbalaji_
Автор

jupyter commend suggestion now showing. Can you please help ?

anand