Data Science Tools: Working with Large Datasets(CSV Files) in Python[2019]

preview_player
Показать описание
In this tutorial we will learn how to work with large datasets[100MB to 1TB+] in python using several data science tools.

Check out the Free Course on- Learn Julia Fundamentals

If you liked the video don't forget to leave a like or subscribe.
If you need any help just message me in the comments, you never know it might help someone else too.
J-Secur1ty JCharisTech

==Get The Data Science Prime App==

==Need Your Dataset Cleaned check out this gig==

Follow
Рекомендации по теме
Комментарии
Автор

Man I totally learned so much just now. And I’ve been working with pandas for almost a decade. Thank you, man!!

MrTigerstyle
Автор

I liked this video very much. One Small question, how to chunk the big csv file based on rows. First 1000 rows at first and next 1000 rows later using pandas?

beginnerInvestor
Автор

With a big file, doing len(file.readlines()) will read the file in in full into memory. Instead, it's much better to iterate it line by line and count them. This is doable by using readline() instead of readlines() since the former will iterate the lines yielding one at a time, so there's no danger of running out of memory. Just a hint... Also, it's best to store big files in a different format than csv. The best by far would be parquet (which pandas can work with without issues) but there are also other alternatives. If you read from parquet, you'll get an unprecedented read-in/write-out speeds and, of course, the compression is second to none.

dariuszspiewak
Автор

But how to find 1 specific word in a 5Gb document?

perry
Автор

This is a great video! It was very informative and I feel like I was able to retain this better

arielsoto
Автор

Brother I could barely hear you, but I still got a lot from this so thanks.

davr
Автор

this video really helped!! great topic covered.

anantdeora
Автор

How to convert csv file in to hdf5 file? can you put any script/code please?

tekishmain
Автор

how to sort big file i am getting server crash error
for sort_values('')

dgenerationxparvez
Автор

Can't figure out what is being said.

wmleng
Автор

I think this could be a great tutorial, but I just couldn't understand the accent... what a shame

hanabisenju
Автор

Thanks. So what do you recommend? modin and dask are fast fast but it probably do not offer all options of normal pandas dataframes?

DanielWeikert
Автор

wtf i cant even understand what hie is saying?!?!?!

movemeplease