filmov
tv
Apache Iceberg Tutorial for Beginners: Understanding Copy-on-write and Merge-on-read
![preview_player](https://i.ytimg.com/vi/Vlw1R0HSHr0/maxresdefault.jpg)
Показать описание
This Apache Iceberg 101 Course #7 focuses on Copy-on-Write (COW) and Merge-on-Read (MOR) - two essential concepts in data lakehouse table formats. In this course, you will learn what Copy-on-Write and Merge-on-Read are, as well as what a delete file is and when to use MOR or COW.
Copy-on-Write is a process used in data lakehouse table formats where changes to the table are written to a new version of the table instead of modifying the existing version. This allows for the original version of the table to remain unchanged while new changes are applied. For example, if a user wants to add a new column to an existing table, instead of modifying the existing version of the table, a copy of it is created with the added column.
Merge-on-Read is another key concept used in data lakehouse table formats. This process involves merging multiple versions of tables when theyre read from storage by an application or query engine. The result from this process is a single view that contains all changes from different versions of tables. For example, when an application requests data from two different versions of a table, both versions are read and merged together into one view that contains all changes made to both versions.
In addition to Copy-on-Write and Merge-on Read processes, this Apache Iceberg 101 Course also covers delete files - files which contain information about rows that have been deleted from tables. Delete files can be used with either COW or MOR processes and allow users to keep track of rows that have been deleted without having to rewrite entire tables each time something needs to be removed.
When deciding whether to use Copy-on Write or Merge on Read processes, its important to consider how often data within tables needs to be modified or updated. If frequent updates need to be made, then COW might be more suitable since it allows for quick and easy modification without having to rewrite entire tables each time something needs changing. On the other hand, if large amounts of data need merging then MOR might be more suitable since it allows for multiple versions of tables can be merged together quickly and easily into one single view.
Connect with us!
Copy-on-Write is a process used in data lakehouse table formats where changes to the table are written to a new version of the table instead of modifying the existing version. This allows for the original version of the table to remain unchanged while new changes are applied. For example, if a user wants to add a new column to an existing table, instead of modifying the existing version of the table, a copy of it is created with the added column.
Merge-on-Read is another key concept used in data lakehouse table formats. This process involves merging multiple versions of tables when theyre read from storage by an application or query engine. The result from this process is a single view that contains all changes from different versions of tables. For example, when an application requests data from two different versions of a table, both versions are read and merged together into one view that contains all changes made to both versions.
In addition to Copy-on-Write and Merge-on Read processes, this Apache Iceberg 101 Course also covers delete files - files which contain information about rows that have been deleted from tables. Delete files can be used with either COW or MOR processes and allow users to keep track of rows that have been deleted without having to rewrite entire tables each time something needs to be removed.
When deciding whether to use Copy-on Write or Merge on Read processes, its important to consider how often data within tables needs to be modified or updated. If frequent updates need to be made, then COW might be more suitable since it allows for quick and easy modification without having to rewrite entire tables each time something needs changing. On the other hand, if large amounts of data need merging then MOR might be more suitable since it allows for multiple versions of tables can be merged together quickly and easily into one single view.
Connect with us!
Комментарии