Handling Missing Data Easily Explained| Machine Learning

Показать описание

Data can have missing values for a number of reasons such as observations that were not recorded and data corruption.

Handling missing data is important as many machine learning algorithms do not support data with missing values.

In this tutorial, you will discover how to handle missing data for machine learning with Python.

Specifically, after completing this tutorial you will know:

How to marking invalid or corrupt values as missing in your dataset.
How to remove rows with missing data from your dataset.
How to impute missing values with mean values in your dataset.

You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python

Рекомендации по теме

Комментарии

Your channel is awesome, please keep going! Can't tell you how valuable your videos are when starting to learn!

stevechops

Honestly, I really love your videos, simple and easy to understand. Always answering my machine learning and data science questions! I do have one though. I watched your video on standardisation and normalisation. I am trying to build a benchmark/index, would it be okay to make the data standardized before creating it or?

dishydez

Thank you Krish sir. I was following the kaggle learn course on machine learning but couldn't understand this topic even after so much of hard work - now it's all clear. Keep it up.

raunasur

Thanks Krish. I can't think of an easier explanation of a tricky topic!!! Simply superb!!!👍

equiwave

Today I started working on the titanic data. Tried to predict the missing age values but failed and was very tensed. So, I started watching your video in hope for a way. When you opened the notebook I felt such a relief - 'ki aab to ho hi jaega'. Thank you for making this video.

himalayasinghsheoran

Your explanation is pretty much amazing and your my perfect as usual.

hanman

I think there is a quantitative justification why we should fill the NaN values on 'Age' with median that classified by 'Sex' and 'Pclass'. On EDA step, we can print or visualize heatmap of the correlations between each columns (dataset.corr().abs()). We can see that 'Age' columns has relatively high correlation to 'Sex' and 'Pclass' columns.

radifantaufik

Nice explanation, conclusion depending on your end goal, and whether if drop or change to mean will affect on your analysis, in he’s example he need the age but he didn’t need the cabinet.

strangereview

Hi Krish,

Your videos are quite useful and simple to understand. My request is if you can create video on how we can deploy ML model with Flask that will be very useful..

dilipgawade

Thought that you will also implement Regression Model for synthetic imputation. But the content is great!!

finance_tamil

Thank you for making life so much easier for us!

aimenbaig

Cleared all my doubts! Great..Thank you so much!!

bhaktibailurkar

Thank you Krish, you have explained the second option very well. Wondering how we do this for categorical columns and when values are missing from multiple fields

amarendrakolukula

thanks a lot for sharing your knowledge with us. Kindly address one confusion that do we need to impute missing values in the test data set the same way you have taught in the video?

gaziya

Thanks a lot for detailed explanation. It really helps

coolsun-lifestyle

Well, I appreciate the video that Mr.
krish naik made and i love to see his videos and I really want to discuss on how can we handle missing values. Ok well using separate model to see relation between variables that have complete dataset is not great though because the value, since it's a value generated from machine learning, is not a real data and may statistically far from central of population data because it comes from other equation. I would love to use statistical method like mean, median or mode and, I don't know this will work or not, checking the range of population mean and make sure that the value is not going far from population mean

channelfisikaasik

beautifully explained with the detailing!

kukulaarohi

Great way of explaining things. I like it very much.

konradpyrz

Thanks for the video, you said that option -2 (model based imputation) is less preferred for huge datasets, does that mean that in general it is good to go with statistical based imputation over model based imputation in real world datasets? Since we get lot of data in real world?. I am working on Home-Credit-Default-Risk (kaggle competetion dataset) request your comment on which imputation method to use?

gopie

a really good idea of creating seprate model thanks for sharing.

saurabhtripathi

Handling Missing Data Easily Explained| Machine Learning

Handling Missing Data Easily Explained| Machine Learning

Don't Replace Missing Values In Your Dataset.

#06 - Handling Missing Data Part 1 | Handling Missing Data Easily Explained | Machine Learning 2022

How To Handle Missing Values in Categorical Features

Dealing With Missing Data Part I

Handling Missing Values in Pandas Dataframe | GeeksforGeeks

Dealing with Missing Values in Machine Learning: Easy Explanation for Data Science Interviews

Understanding missing data and missing values. 5 ways to deal with missing data using R programming

Day-2 Application of SPSS for Data Analysis (Quantitative Data Analysis)

Handling Missing Values | Machine Learning | GeeksforGeeks

Dealing With Missing Data - Multiple Imputation

Python Pandas Tutorial 5: Handle Missing Data: fillna, dropna, interpolate

Handling Missing Values | Python for Data Analysts

Handling Missing Data | Part 1 | Complete Case Analysis

Impute missing values using KNNImputer or IterativeImputer

Handling NA in R | is.na, na.omit & na.rm Functions for Missing Values

R Tutorial: Handling missing data

Python Tutorial: Handling missing values

Handling Missing Data in Stata

How to Handle Missing Values in R Using RStudio

Data Pre-processing in R: Handling Missing Data

4.3. Handling Missing Values in Machine Learning | Imputation | Dropping

Dealing With Missing Values Explained for Beginners | Dropping / Imputing Data

Python Pandas Tutorial (Part 9): Cleaning Data - Casting Datatypes and Handling Missing Values