(Code) KNN Imputer for imputing missing values | Machine Learning

preview_player
Показать описание
#knn #imputer #python

In this tutorial, we'll will be implementing KNN Imputer in Python, a technique by which we can effortlessly impute missing values in a dataset by looking at neighboring values.

Machine Learning models can't inherently work with missing data, and hence it becomes imperative to learn how to properly decide between different kinds of imputation techniques to achieve the best possible model for our use case.

KNN works on the intuition that to fill a missing value, it is better to impute with values that are more likely to be like that row, or mathematically, it tries to find points (other rows in the dataset) in space which are the closest to it, and uses them as a benchmark for figuring out a value to impute.

I've uploaded all the relevant code and datasets used here (and all other tutorials for that matter) on my github page which is accessible here:

Link:

If you like my content, please don not forget to upvote this video and subscribe to my channel.

If you have any qualms regarding any of the content here, please feel free to comment below and I'll be happy to assist you in whatever capacity possible.

Thank you!
Рекомендации по теме
Комментарии
Автор

this is the best video (of any kind) i have watched all week... will reply to get a better understanding, but even the first pass was informative and useful, thank you!

faizikramulla
Автор

This Video came in clutch for me on a project. Had to subscribe. Thank you so much!!!

edidiongesu
Автор

Rachit, you nicely explain KNN imputation coding. Great

ranjitgawande
Автор

You explained it in such a great way 🤩🤩, thanks for this amazing video

Sinister_Rewind
Автор

Great Video Sir . thank you very much for your clear & valuable explanation .

muthierry
Автор

Great video as always Rachit. Can you make videos on Target encoding, Mean Encoding, Weight of Evidence Encoding, Probability Encoding, etc. techniques, and how can we use them in a Pipeline?
That would be very helpful. Also, how can we use these techniques (mentioned ones) with Cross-validation?

chiragsharma
Автор

thanks for the video. i have converted categorical data into numerical data then applied knn imputer. some values imputed with floats like 2.37. 1.89 etc, which doesn't make any sense. can you suggest best algorithms to impute categorical data without float values.

vamseesworld
Автор

Nice one ! Sorry I didn't understand what you said @8:15, about imputing done on training set even if we are doing it for X_test. Request you to elucidate more. Thanks.

rohitjagdale
Автор

thank u thank u ....❤ u ....well explained...

sriraj
Автор

thanks for the video, now after imputing how to export the data after filling the missing values?

shahedabdulhadi
Автор

Hi rachit your way of teaching is excellent. can you please make a coding video on MICE imputer.

Chinmayluv
Автор

Brilliant one. Thank you. Quick question, dataset after Knn transformation didnt have column names. Any tips on retaining the variable names?

vish
Автор

Thanks for the video, I have a small doubt, here you have treated numerical columns, what about categorical ? Can we use cat in place of 'num' to treat the values?

prernajha
Автор

Another awesome video...I got following queries after watching.
1. Since KNN is distance algorithm should we perform outlier treatment and standardization before doing the imputation?
2. You mentioned about indicator column that "Missingness:Quality of row being missing can be feature for us". Could you tell how it will be useful?
3. Is there any thumb rule for using whne to use KNN vs MICE?
Thanks!

abrahammathew
Автор

Thank you for your video, this helps a lot!
I have a few questions:

1) It is not clear to me what values you are imputing. The original data has missing values only on categorical columns.
Then you introduce some nans in 'age' and 'hours-per-week'. Is that to create a training set?
2) Before you do the KNN, you create X_train, X_test, y_train, and y_test. Is that always necessary? What if your data is like in your example with the movies (but larger), will you still do this?
3) You are using income as your target column. But what happens if you don't have such a kind of binary column?
4) After you do the imputation, you lost labels on columns and rows. How do you recover that? More importantly, how do you display the original data but including the imputed data.

Thanks again for your videos!

ivanrazu
Автор

how do we impute missing data in categorical variables

nandinisharma
Автор

Plz make video on hot deck and expectations maximization Imputation method in python.

bhushantayade
Автор

Can anyone help here for fixing position issue for a graph.
All graphs are coming in 1 column and in 3 rows and what I am looking for is single row and in three column .
ax=plt.subplots()
ax=sns.distplot(data['LotFrontage'], hist=False)
ax=sns.distplot(X_train['LotFrontage'], hist=False)

ax=plt.subplots()
ax=sns.distplot(data['MasVnrArea'], hist=False)
ax=sns.distplot(X_train['MasVnrArea'], hist=False)

ax=plt.subplots()
ax=sns.distplot(data['GarageYrBlt'], hist=False)
ax=sns.distplot(X_train['GarageYrBlt'], hist=False)

ajaykushwaha-jemw
Автор

Hi,

Can you help me with the below doubt?

Here in the video, we are taking numeric columns and we are imputing values. this is said in the stmt

num = [col for col in X_train.columns if X_train[col].dtypes != 'O']

my qn is how should we do imputing categoical values? am using wine dataset wherein i need to impute values under 'quality' column which has got values as Quality A, Quality B and null. If i leave out this column and carry on with other numerical columns, anyways the null values in this column is not getting imputed. How to handle for such columns which holds categorical values and dtype as Object? pls explain.TIA!

pradeebhabenildus
Автор

I have a question, I noticed that you did train test split, isn't that only necessary when you want to build a predictive model? Is it necessary to use train test split for imputation purposes? Thank you !

stoicsimulation