Why You Shouldn’t Care About Iceberg | Tabular

preview_player
Показать описание

ABOUT THE TALK:

Ryan Blue, co-creator of the Apache Iceberg project will try to convince you not to care about Iceberg: if you’re thinking about your table format, then it isn’t doing a good enough job.

This session will show how Iceberg solves real-world problems that used to take hours or days of time from data engineers and analysts:

Safe schema changes — no more zombie data columns
Layout evolution — update table partitioning without rewriting any queries
Hidden partitioning — safe and fast queries without being a DBA
Future work — current frustrations and how we’re making them disappear

ABOUT THE SPEAKER:

Ryan is the co-creator of Apache Iceberg and spent the last decade working on big data formats and infrastructure at Netflix, Cloudera, and now Tabular. He is an ASF member and a committer in the Apache Parquet, Avro, and Spark communities.

ABOUT DATA COUNCIL:

FOLLOW DATA COUNCIL:
Рекомендации по теме
Комментарии
Автор

I went through all the problems you mention.. When I first started using Hadoop / Spark / Flink I was very frustrating about all the very low level aspects you were required to master before being able to read or write any single piece of data
I had the feeling of being the only one asking for data portabiliy and security.
Having a common format for describing input/otput data (and metadata as well) is the fundamental point of any big or small data solution

FlavioPompermaier
Автор

A really great talk! Looking forward to filling the box between storage and compute layers over the next few years.

npestrov