filmov
tv
Onehouse at Trino Fest - Turbocharge Trino queries with Hudi’s multi modal indexing subsystem
Показать описание
The growth of analytical applications is expected to continue, driven by the increasing need for data-driven insights. With a continuous explosion of activity, these applications need to be efficient in how they ingest, process and analyze petabytes and exabytes of data. Many lakehouse technologies lack index support and perform full table scans, which is slow and resource intensive for these large datasets. Apache Hudi is a transactional data lake platform with full mutability support, including streaming upserts, and provides a powerful incremental processing framework. Apache Hudi powers the largest transactional data lakes in the industry and delivers faster write transactions on huge/wide tables and faster query performance with a multi-modal indexing subsystem.
Trino’s massive parallel processing coupled with Hudi’s multi-modal indexing subsystem unlocks queries magnitudes faster. With Hudi, Trino can now leverage the metadata table to improve the file listing performance. Hudi leverages partition’s details stored in its metadata table instead of having to do intensive file system calls. In addition, Trino can take advantage of advanced data-skipping techniques by using the column stats index to improve query performance. In this talk, we’ll cover:
- The current challenges of writing and querying data at low latency with data lakes
- How multi-modal indexing and the metadata table operate in Hudi
- How Trino unlocks orders of magnitudes faster queries by leveraging Hudi’s metadata table and multi-modal index
- How you can build compute-efficient large-scale data applications with Trino and Hudi
Trino’s massive parallel processing coupled with Hudi’s multi-modal indexing subsystem unlocks queries magnitudes faster. With Hudi, Trino can now leverage the metadata table to improve the file listing performance. Hudi leverages partition’s details stored in its metadata table instead of having to do intensive file system calls. In addition, Trino can take advantage of advanced data-skipping techniques by using the column stats index to improve query performance. In this talk, we’ll cover:
- The current challenges of writing and querying data at low latency with data lakes
- How multi-modal indexing and the metadata table operate in Hudi
- How Trino unlocks orders of magnitudes faster queries by leveraging Hudi’s metadata table and multi-modal index
- How you can build compute-efficient large-scale data applications with Trino and Hudi
Комментарии