Handling Missing Data | Part 1 | Complete Case Analysis

preview_player
Показать описание
Handling missing data is an essential step in the data preprocessing pipeline, ensuring that ML models are trained on high-quality, representative datasets, leading to more accurate and reliable predictions Techniques like imputation, dropping missing values, or advanced methods such as Multiple Imputation can be employed based on the nature and impact of missing data. Choosing the right strategy ensures the reliability and accuracy of your models.

============================
Do you want to learn from me?
============================

📱 Grow with us:

⌚Time Stamps⌚

00:00 - Intro
00:58 - Handling Missing Data
05:50 - Complete Case Analysis [CCA]
07:09 - Assumption for CCA
09:38 - Advantages and Disadvantages of CCA
11:39 - When to use CCA?
13:24 - Code Example
Рекомендации по теме
Комментарии
Автор

You are the fist youtuber on youtube with zero dislike. It makes me happy.
Sir app ka effor kabile tareef hai !

ajaykushwaha-jemw
Автор

I can't comprehend how much I've learned from your videos. Got my first silver medal in kaggle today. All credit goes to you.

Feature engineering is so important, I'm focusing really hard on all these topics and you've done an amazing job at making these thorough tutorials. You're a great teacher. 🙏

akash.deblanq
Автор

You are the fist youtuber on youtube with zero dislike. It makes me happy.
Sir app ka effor kabile tareef hai

MuhammadJunaid-yrjd
Автор

Sir ap first hain jinho ne complete btaya k q or kb apply krna CCA wrna mostly har koi bs btaa deta k apply krna ye ni btata k q krna . Thank u so much Sir again for providing this knowledge.

GamerBoy-iijc
Автор

Real Guru. Dhnya ho gaya main, jabse aapki video dekhi hai.

Sanjay_Singh_Bisht
Автор

This was extremely helpful and exactly what I was looking for. Thank you

ayesha
Автор

This is Gold for new learners, Thanks Nitish

Shahad
Автор

new_df= df.dropna(subset=cols) to drop the rows and keep the cols as it is i.e the new_df.shape= (17182, 13)

AmbujRai-ftcx
Автор

During the code example, why have we removed cols where null value percentage of data is greater than 5%.

yearsago
Автор

Target to complete the playlist by 12th January 202. Deserve more views. You are doing a great job

jitendratrivedi
Автор

Guru ji, aap gajab ka padhate hain, maja aa jata hai.

ajaykushwaha
Автор

thank you soo much sir, crystal clear😇

Aestheticdeeps
Автор

Hi! Thank you for the wonderful playlist. I have a query can we remove missing values using XGBoost), or probabilistic methods like Bayesian statistics .

GAMEZONEX
Автор

Hello sir. How to understand whether the missing data is missing at random or not

SumitKumar-uqdg
Автор

after CCA we are left with 17000 rows of new__df and 19000 rows of df .how to concatenate them for modeelling

shubhankarsharma
Автор

cols= [var for var in df.columns if and (df[var].isnull().mean>0))

niranjania
Автор

what if % of missing values of an attribute is exactly 5% then ? should we perform CCA

gautamdinga
Автор

@CampusX If factual data is missing like manufacture year of a vehicle. Is it fine to impute it? (Size of ds: 20k)

rachanakotha
Автор

How to add this cca data back to main dataframe???

nikitha
Автор

TypeError: '>' not supported be tween instances of 'method' and 'int'

niranjania