filmov
tv
Row Group Size in Parquet: Not too big, not too small

Показать описание
In this video, we'll learn all about row group sizes in Apache Parquet. With the help of DuckDB, we'll query files with different sizes and see how they fare for different queries.
#apacheparquet #dataengineering #duckdb
00:20 Intro to different files
00:31 Import DuckDB
00:43 Run query function
00:53 Basic count query
01:33 Performance of all files for count query
02:05 Looking into row group metadata for each file
02:35 Average salary query
02:53 Performance of all files for average salary query
03:10 htop output for each query to analyze CPU usage
05:28 Average value query
06:03 Performance of all files for average value query
06:20 Compute the number of row groups matched for each file for the average value query
06:56 Conclusion
#apacheparquet #dataengineering #duckdb
00:20 Intro to different files
00:31 Import DuckDB
00:43 Run query function
00:53 Basic count query
01:33 Performance of all files for count query
02:05 Looking into row group metadata for each file
02:35 Average salary query
02:53 Performance of all files for average salary query
03:10 htop output for each query to analyze CPU usage
05:28 Average value query
06:03 Performance of all files for average value query
06:20 Compute the number of row groups matched for each file for the average value query
06:56 Conclusion
Комментарии