filmov
tv
Handling Large Datasets in Pandas | #42 of 53: The Complete Pandas Course
![preview_player](https://i.ytimg.com/vi/E7iwJUzm3Jo/maxresdefault.jpg)
Показать описание
We are going to learn these small but very useful things starting now itself that is Handle Very Large Dataset in Python Pandas.
One of the very commonly used trick to handle very large data set is to compress that data set into a different file format. Pandas provides certain efficient file formats. Let's see how it works.
Join ML+ membership for exclusive Data science content
🔹 Tips and Tricks on Handling Very Large Dataset in Python Pandas.
So here we have a large data set a CSV file, to load the CSV file itself, it took 2.84 seconds, this is not very large, but compared to the other data sets that we have seen in the course of what this is comparatively large.
This is a very effective compression method where it will reduce the file size significantly. Alright, so on running this, this takes a bit of time to run compared to a regular CSV export. This takes a little bit more time because this is compressing the data right? So let it do its job. I'm going to fast forward this a bit. So that took 39.5 seconds, considerable amount of time. But yeah, this will save a lot of space.
Now, the file size, this storage that this particular file takes will be much smaller. So this is the amount of space 14.5 MB This is the best that we can do. Right? Such a big file, this much amount of content became compressed to 14 MB 14.5 Mb, that's significant. Now let's see how long it takes to load this file back into pandas. When you are trying to load it, you can load it using read dot CSV itself.
You also additionally need to specify the compression as GC so that pandas know how to uncompressed it. So it took 3.02 seconds earlier, the direct CSV file took 2.5 seconds This one is taking half a second more. All right. So this is a major boost to save memory storage in your file disk right so that's okay.
Let me know in the comments section if you have any questions!
🤝 Like, Share, Subscribe for more!
Follow us on our social media handles for all updates, events and live sessions-
If you enjoyed this video, be sure to throw it a like and make sure to subscribe to not miss any future videos!
Thanks for watching!
#machinelearningplus #python #pandas #datascience
One of the very commonly used trick to handle very large data set is to compress that data set into a different file format. Pandas provides certain efficient file formats. Let's see how it works.
Join ML+ membership for exclusive Data science content
🔹 Tips and Tricks on Handling Very Large Dataset in Python Pandas.
So here we have a large data set a CSV file, to load the CSV file itself, it took 2.84 seconds, this is not very large, but compared to the other data sets that we have seen in the course of what this is comparatively large.
This is a very effective compression method where it will reduce the file size significantly. Alright, so on running this, this takes a bit of time to run compared to a regular CSV export. This takes a little bit more time because this is compressing the data right? So let it do its job. I'm going to fast forward this a bit. So that took 39.5 seconds, considerable amount of time. But yeah, this will save a lot of space.
Now, the file size, this storage that this particular file takes will be much smaller. So this is the amount of space 14.5 MB This is the best that we can do. Right? Such a big file, this much amount of content became compressed to 14 MB 14.5 Mb, that's significant. Now let's see how long it takes to load this file back into pandas. When you are trying to load it, you can load it using read dot CSV itself.
You also additionally need to specify the compression as GC so that pandas know how to uncompressed it. So it took 3.02 seconds earlier, the direct CSV file took 2.5 seconds This one is taking half a second more. All right. So this is a major boost to save memory storage in your file disk right so that's okay.
Let me know in the comments section if you have any questions!
🤝 Like, Share, Subscribe for more!
Follow us on our social media handles for all updates, events and live sessions-
If you enjoyed this video, be sure to throw it a like and make sure to subscribe to not miss any future videos!
Thanks for watching!
#machinelearningplus #python #pandas #datascience
Комментарии