DBSCAN Clustering for Identifying Outliers Using Python - Tutorial 22 in Jupyter Notebook

preview_player
Показать описание
In this tutorial about python for data science, you will learn about DBSCAN (Density-based spatial clustering of applications with noise) Clustering method to identify/ detect outliers in python. you will learn how to use two important DBSCAN model parameters i.e. Eps and min_samples. Environment used for coding is Jupyter notebook. (Anaconda)

This is the 22th Video of Python for Data Science Course! In This series I will explain to you Python and Data Science all the time! It is a deep rooted fact, Python is the best programming language for data analysis because of its libraries for manipulating, storing, and gaining understanding from data. Watch this video to learn about the language that make Python the data science powerhouse. Jupyter Notebooks have become very popular in the last few years, and for good reason. They allow you to create and share documents that contain live code, equations, visualizations and markdown text. This can all be run from directly in the browser. It is an essential tool to learn if you are getting started in Data Science, but will also have tons of benefits outside of that field. Harvard Business Review named data scientist "the sexiest job of the 21st century." Python pandas is a commonly-used tool in the industry to easily and professionally clean, analyze, and visualize data of varying sizes and types. We'll learn how to use pandas, Scipy, Sci-kit learn and matplotlib tools to extract meaningful insights and recommendations from real-world datasets.

Download Link for Cars Data Set:

Download Link for Enrollment Forecast:

Download Link for Iris Data Set:

Download Link for Snow Inventory:

Download Link for Super Store Sales:

Download Link for States:

Download Link for Spam-base Data Base:

Download Link for Parsed Data:

Download Link for HTML File:

-~-~~-~~~-~~-~-
Please watch: "How to Calculate Age from Date of Birth in Excel in Years Months and Days (Simple Formula)"
-~-~~-~~~-~~-~-
Рекомендации по теме
Комментарии
Автор

For future viewers, if you have the DBSCAN error when using Python 3, replace this line
ax.scatter(data.iloc[:, 2].values, data.iloc[:, 1].values, c=colors, s=120).

mateofriedman
Автор

Thank you so much for the video. How can we determine epsilon, could you please advise on that?

sukhendutarafder
Автор

Thanks for this series of videos. They are very useful.

One question I have about this video is, when you created the scatter plot, why did u choose Petal Length and Sepal Width? Why not any other combination of columns?

muralimayhem
Автор

Thank you sir for your reply: I have tried the same in Jupyter but still getting the same error.

DBSCAN(algorithm='auto', eps=0.8, leaf_size=30, metric='euclidean',
min_samples=19, n_jobs=1, p=None)
Counter({1: 94, 0: 50, -1: 6})

TypeError Traceback (most recent call last)
in <module>()
48 print Counter(model.labels_)
49
---> 50 print
51
52

TypeError: 'DataFrame' object is not callable


Your advise is highly appreciated.... thanks a lot,

sukhendutarafder
Автор

Hello, I am getting this issue while running the steps: print
Error:
File "<ipython-input-51-eb4c9f22ec8e>", line 1, in <module>
print

TypeError: 'DataFrame' object is not callable.
I have run the code in Spyder (2.7). Could you please advise me?

sukhendutarafder
Автор

Hello, I'm getting an issue when trying to apply your code to my data set. I get a "TypeError: unhashable type: 'slice'" response when trying to run the scatterplot for outlier detection

kmillanr
Автор

Hi, this video series is very good. If you have these slides somewhere to refer ?

devenderchaudhary
Автор

Good video! I want to ask you a question:


In video, you say that outliers make up < 5% for total observations. Where the '5%' came from? Is there a reference for that?

ramosjanoah
Автор

I really don't understand the numpy index notation, such as data[ :, 2] or data[ data :, 1] . I really have no idea how to interpret this. And when it comes to plotting, everyone seems to be using this notation, but it is never explained. I'm relatively new to python, so maybe this notation is common knowledge. I have plots to make, and it's urgent. But i just can't seem to be able to understand this notation. I really don't get it.

StormWolf
Автор

Hi, Your video series is wonderful expecially the topic selection and ordering.But in this video I am facing some issues.
I exactly did the way you did, but at below step I am getting error "TypeError: unhashable type: 'slice'" when I change
fig = plt.figure()
ax = fig.add_axes([.1, .1, 1, 1])

colors = model.labels_
ax.scatter(data[:, 2], data[data:, 1], c=colors, s=120)

ax.set_xlabel('Petel Length')
ax.set_ylabel('Sepel Width')
plt.title('DBSCAN for outlier')

When I convert this code
data = df.iloc[:, 0:4]
target = df.iloc[:, 4]

to
data = df.iloc[:, 0:4].value
target = df.iloc[:, 4].value

I am getting error "TypeError: only integer scalar arrays can be converted to a scalar index"
Pls note I am using the same data set and until this step all the outputs were same as yours
Pls help

vmohakrish
Автор

Frankly speaking, none of your codes are actually working, alteast not on my system. I always have to find codes from different places.

-no-handle
join shbcf.ru