Onehouse at Trino Fest - Turbocharge Trino queries with Hudi’s multi modal indexing subsystem

Показать описание

The growth of analytical applications is expected to continue, driven by the increasing need for data-driven insights. With a continuous explosion of activity, these applications need to be efficient in how they ingest, process and analyze petabytes and exabytes of data. Many lakehouse technologies lack index support and perform full table scans, which is slow and resource intensive for these large datasets. Apache Hudi is a transactional data lake platform with full mutability support, including streaming upserts, and provides a powerful incremental processing framework. Apache Hudi powers the largest transactional data lakes in the industry and delivers faster write transactions on huge/wide tables and faster query performance with a multi-modal indexing subsystem.

Trino’s massive parallel processing coupled with Hudi’s multi-modal indexing subsystem unlocks queries magnitudes faster. With Hudi, Trino can now leverage the metadata table to improve the file listing performance. Hudi leverages partition’s details stored in its metadata table instead of having to do intensive file system calls. In addition, Trino can take advantage of advanced data-skipping techniques by using the column stats index to improve query performance. In this talk, we’ll cover:
- The current challenges of writing and querying data at low latency with data lakes
- How multi-modal indexing and the metadata table operate in Hudi
- How Trino unlocks orders of magnitudes faster queries by leveraging Hudi’s metadata table and multi-modal index
- How you can build compute-efficient large-scale data applications with Trino and Hudi

Trino

Рекомендации по теме

Комментарии

Whoa, this was awesome, guys! Loved how you showed off Hudi's multi-modal indexing. But first, I've got to upgrade my current Hudi 0.10.1 lakehouse setup. After that, I'm super excited to try this out with Trino. Really looking forward to it. Great stuff! Thanks!

istvandarvas

I think this recording lacks clarity on connector part.

balamaheshjampani

Onehouse at Trino Fest - Turbocharge Trino queries with Hudi’s multi modal indexing subsystem

Onehouse at Trino Fest - Turbocharge Trino queries with Hudi’s multi modal indexing subsystem

Alluxio at Trino Fest - Trino optimization with distributed caching on data lake

Redis at Trino Fest - Real-time indexed SQL queries (and a new connector!)

Tabular at Trino Fest - CDC patterns in Apache Iceberg

Stripe at Trino Fest - Inspecting Trino on Ice

Trino at Quora: Speed, Cost, Reliability Challenges and Tips

Enhancing Trino's query performance and data management with Hudi: innovations and future

Fast results using Iceberg and Trino

Iceberg + Spark + Trino a modern opensource data stack for blockchain

Real time Analytics with Trino and Apache Pinot

Journey to Iceberg with SK Telecom

41: Trino puts on its Hudi

CHIHIRO- Billie Eilish , MTG BY MULÚ. Lyrics 1 HOUR / 1 HORA Loop

Lab with Trino Co-Creators: Tuning Queries on your Trino Cluster

Centralized Delta Lake Using Trino

[DET Webinar] Demystifying Apache Hudi

Different table types in Apache Hudi | MOR and COW | Deep Dive | By Sivabalan Narayanan

Trino Community Broadcast 55: Commander Bun Bun peeks at Peaka

Trino Community Broadcast 60: Trino AI functions

Presto SQL Trino Course Part 1 why presto

Utilizando o Apache Iceberg com Spark, Trino e Snowflake | Live #73

Presto Tech Talk: Optimizing table layout for Presto using Apache Hudi

EP22 - Dremio and Data Lakehouse Table Formats (Apache Iceberg, Delta Lake and Apache Hudi & Dre...

EP33 - The Who, What and Why of Data Lakehouse Table Formats