Finding an outlier in a dataset using Python

preview_player
Показать описание
In this video we will understand how we can find an outlier in a dataset using python.

ref: #medium articles

#Outlierdetection

You can buy my book on Finance with Machine Learning and Deep Learning from the below url

Рекомендации по теме
Комментарии
Автор

Clustering techniques are also widely used in industry to detect outliers. Specially isolation forest algo

yourkarma
Автор

The tutorial offers a lucid explanation of a complex problem of outliers. It is well-presented with examples that made it easier to follow. However, threshold = 3 isn't working for me. I modified it to threshold = 3+std to make it work properly. Moreover, declaring outliers = [ ] outside the function is causing problems if you want use this function in another dataset in the same notebook. So, declaring outlier list inside the function would be a better approach, I think.

shujashakir
Автор

You have explained things well. Just one correction - it's inter-quartile range and not inter-quantile range.

mridulagarwal
Автор

Amazing Krish, now I understand the concept of outliers, thanks

doubando
Автор

Here is the correction lower bound = q1 - 1.5*IQR and upper bound = q3 + 1.5*IQR

shadrul
Автор

Superb explanation...in very simple way..

AmitSharma-pozb
Автор

Very clear and crisp explanation, loved it

srijeetful
Автор

13:57 Correction
Lower bound=Q1-IQR*1.5
Upeer bound= Q3+IQR*1.5

vamsinadh
Автор

Thank you so much sir, I understood everything

adityapradhan
Автор

I have a couple of questions.
1. Is it always better to remove the outliers or could it be big mistake as well? You gave an example of a fraudulent transaction. Now, an outlier indeed is a hint that the transaction was fraud. If I remove all transactions at the first place, how am i going to achieve my results?

2. You did not explain how do we perform outlier checks with multivariate dataset. Suppose IRIS dataset. I have seen a couple of videos here and there but no proper way is coming out. What is the proper way to identify outliers with multivariate datasets.

Tahnks

smalirizvi
Автор

Nice work mate. I also tried something similar but with Upper and Lower Bound on the Return

thedatascientist_me
Автор

Hi Krish, Thank you so much for the tutorial, Very clear and crisp explanation, loved it :)

satheeshswaminathan
Автор

Just a correction, when calculating z-score, you are doing subtraction of i to an array, you should enumerate on datasets and then subset i from the current index of mean and std.

kaka
Автор

great video sir. great content, and explained in the cleanest way possible. thanks

yuktikhantwal
Автор

Well explained, would be great if you can add some plot for visualization.

ryando
Автор

Nice Content and you explained it very well.ThankYou So Much

gyapti-fctfinder
Автор

Hi Krish, well explained. can you please post a video on how to equate the outliers using any dataset. Thanks in advance.

Ashokkumar-scvt
Автор

I have been following your videos and I have learnt many things Krish Naik. Could you please tell me have you written any Datascience and machine learning books. I would like to buy your books and follow your videos to clinch Datascience job as soon as possible.

ksoftqatutorials
Автор

Hi Krish
Thanks for excellent explanation....But if we get some outliers in any feature should we remove those records containing outliers(but in this case we loose some data), if not then how can we handle outliers??? Please cover this portion also :)

niveshtayal
Автор

thanks for sharing this video.
One correction, in the loop it should be *outliers.append(i) *
not
outliers.append(y)

otroleonarbe