Read Large Dataset Quickly - Feather vs Parquet vs Jay vs CSV #python #pandas #coding #programming

preview_player
Показать описание
We generally prefer CSV files to read data for Data science stuff. But While dealing with large datasets like millions of rows, CSV file format outperforms. It takes quite a long time to load the data. Some other file formats also exist that work well while dealing with large datasets.

Some Efficient File Format:-
Feather format - It is fast, lightweight, and uses binary file format for storing data. It takes 122 milliseconds to read One million rows.
Parquet format - Parquet is more efficient in terms of storage and performance. whereas data is stored in a row-oriented approach. And it takes 159 milliseconds to read 1 million row
Jay format - It also uses a binary format for storing data frames and because of that it is fast, lightweight, and easy-to-use. And it takes only 235 microseconds, which is the least of all.

#leetcode #codingchallenge #technology #tech

Show your support by subscribing to my channel.

Thank you
Рекомендации по теме
Комментарии
Автор

To read 1million Rows:
Feather - 122ms (Binary Format)
Parquet - 159ms (Row oriented ? Or Column Oriented approach)
Jay - 235 microsecs (Binary Format)

amitrou