How do I find and remove duplicate rows in pandas?

preview_player
Показать описание
During the data cleaning process, you will often need to figure out whether you have duplicate data, and if so, how to deal with it. In this video, I'll demonstrate the two key methods for finding and removing duplicate rows, as well as how to modify their behavior to suit your specific needs.

SUBSCRIBE to learn data science with Python:

JOIN the "Data School Insiders" community and receive exclusive rewards:

== RESOURCES ==

== LET'S CONNECT! ==
Рекомендации по теме
Комментарии
Автор

I spent hours trying to figure this stuff out through reading chapters and chapters in Python books. Then I come here, and everything I was trying to figure out was explained in 9 minutes. This was IMMENSELY helpful, thanks!

fredcalo
Автор

lol, just when I felt you wouldn't handle the exact subject I was looking for: there came the bonus! Thanks!

jordyleffers
Автор

Thanks so much for this! You helped me combine 629 files and remove 250k duplicate rows!
You're the man! *Subscribed*

reubenwyoung
Автор

I like your concise and precise videos. I really appreciate your efforts.

mea
Автор

wow! you are already teaching data science in 2014 when it is not even popular! Btw, your videos are really good, you speak slow and clear, easy to understand and for me to catch. Kudos to you!

hongyeegan
Автор

just find your channel, just watched this as my first watch for your videos, and pressed subscribe !!!, cause your explanation for the idea as whole is very remarkable 😃 thanks a lot .

minaha
Автор

A very much appreciated efforts. Thanks a million for sharing with us your python knowledge. It has been a wonderful journey with your precise explanation. keep the hard work! Warm regards.

oeb
Автор

I have watched a lot of your videos; and I must say that the way, you explain is really good. Just to inform you that I am new to programming let alone Python.
I want to learn a new thing from you. Let me give you a brief. I am working on a dataset to predict App Rating from Google Play Store. There is an attribute by name "Rating" which has a lot of null values. I want to replace those null values using a median from another attribute by name "Reviews". But I want to categorize the attribute "Reviews" in multiple categories like:
1st category would be for the reviews less than 100, 000,
2nd category would be for the reviews between 100, 001 and 1, 000, 000,
3rd category would be for the reviews between 1, 000, 001 and 5, 000, 000 and
4th category would be for the reviews anything more than 5, 000, 000.
Although, I tried a lot, I failed to create multiple categories. I was able to create only 2 categories using the below command:
gps['Reviews Group'] = [1 if x <= else 2 for x in gps['Reviews']]
gps is the Data Set.
I replaced the Null Values using the below command:
gps['Rating'] = gps.groupby('Reviews x: x.fillna(x.median()))

Please help me create multiple categories for "Reviews" as mentioned above and replace all the Null Values in "Rating".

shashwatpaul
Автор

I didn't find much in Duplicates. Thanks so much sir. I can't thank u enough.

dhananjaykansal
Автор

I always find what I need in your channel.. and more... Thank you

rashayahya
Автор

love u brother . u r changing so many lives, thanku ....the best teacher award goes to Data school.

ranveersharma
Автор

You are the greatest teacher in the world

emanueleco
Автор

That's exactly what I was looking for, great explanation, thanks for sharing!

narbigogul
Автор

THANK YOU for the keep tip, that's exactly what I was looking for!

supa.scoopa
Автор

Really, your teaching method is very good, your videoes give more knowledge, Thanks Data School

cablemaster
Автор

Kevin your videos are super helpful! thank you!!!

Kristina_Tsoy
Автор

You have done very Good jobs about under standing of DataFrame and make very easy to understanding DataFrame it so easy with the people which are working in excel
Best wishes from me

balajibhaskarraokondhekar
Автор

Thanks a lot. It was a great help. Much appreciated!

imad_uddin
Автор

Really great gob. Thank you very much!!

jamesdoone
Автор

Great video. This helped me tremendously.
How would you go about finding duplicates "case insensitive" with a certain field?

cafdo