What is a data lakehouse? | Starburst Academy

Показать описание
0:00 Introduction
0:11 The difference between data lakes and data lakehouses
1:48 Lakehouse reporting structure
3:15 Example of Iceberg
What's the difference between data lakes and data lakehouses? This video explains how Iceberg, Delta Lake, and Hudi use a different architecture than Hive, and how this makes the enhanced features of a data lakehouse possible.
This is an architectural deep dive showing you how Iceberg makes use of metadata differently from traditional data lakes, and how these differences demarcate it as a modern data lake or data lakehouse. Although Iceberg data lakes/lakehouses make use of the same underlying cloud object storage found in traditional data lakes (including AWS, Azure, or GCP), the handling of metadata is so different that it marks a new era.
This includes many new features like time travel, schema evolution, partition evolution, and ACID compliance. Overall, this helps blend the functionality of data lakes with something approximating a data warehouse or transactional database (OLTP). For many organizations, these new features are so different that they make the case for data lakes over other technologies. Since data lakes are often the least expensive storage option, the modern data lake is a compelling example of a modern data stack that is suitable for all data types (including semi-structured and unstructured data), saves organizations money.
#data #datalakehouse #datalakehouses #icehouse #dataicehouse #datalake #datalakes #dataengineering #dataengineer #apacheiceberg #hudi #deltalake #cloud #cloudobjectstorage #objectstorage #datawarehouse #dataaalytics #trino #aws #azure #gcp #techeducation #dataeducation #s3 #azureblob #googlecloud #googlecloudstorage