Tech Chat | Slowly Changing Dimensions (SCD) Type 2

preview_player
Показать описание
We will discuss a popular online analytics processing (OLAP) fundamental - slowly changing dimensions (SCD) - specifically Type-2. As we have discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes. Type 2 SCD within data warehousing allows you to keep track of both the history and current data over time. We will discuss how to apply these concepts to your data lake within the context of the market segmentation of a climbing eCommerce site.

Speaker:
Douglas Moore, Solution Architect

Рекомендации по теме
Комментарии
Автор

Great material again....another session on generating the surrogate keys would be great. I have seen folks adding dummy members in dimensions and initially have them set the record to invisible and just update the values as new members come in...ain't pretty but it works...

funwithazure
Автор

The link to the notebook bring to this youtube video. Can we get the notebook of this demo??

dipeshvora
Автор

Could you please share notebook you used

prasanthperumalla
Автор

can we have a some logic where we do not have to use this mergekey as extra column, as I also want schema evolution along with scd2, but using this approach with schema evolution I am also getting the mergekey column in the output, which I don't want, so is there any way?

omkarshirsat
Автор

Is there any performance difference between implementing SCD2 using Merge INTO and incorporating SCD2 by UNIONing changed, unchanged and new dataset then insert that into the target by INSERT OVERWRITE TABLE statement?

debanjanray
Автор

So we have history handled records in the Gold zone which has data in dimensional model. What about the Silver zone. Do we have historical records over there as well or do we only keep the current records from the source? What if I had to do offload reporting on Silver zone and that involves historical reporting as well..? I am offloading reports on silver zone because it has table structure very similar to that of source systems and the currently running reports on the source system tables can be offloaded here easily rather than on the Gold Zone.

omairkhan
Автор

Merge into performance to be watched closely and partition pruning strategy is not clear
Could you please cover partitioning techniques in this context ?

ahmedbedhiaf
Автор

This link is for the folder, there is a README and the notebook is the .html file. Enjoy.

dmoore