Data Modeling Tutorial: Star Schema (aka Kimball Approach)

preview_player
Показать описание

It's hard to last as a data engineer without understanding basic data modeling.

In this video we'll cover the basics of one of the most common approaches: Star Schema data modeling.

...Aka Kimball modeling.

...Aka Dimensional modeling.

We'll discuss the high level concepts, I'll show how to build one from scratch and we'll end with a review of the benefits & future topics to explore.

Thank you for watching!

Timestamps:
00:00 - Intro
00:31 - High-level Overview
01:35 - Intro to Fact Tables
03:56 - Create a Fact Table
07:25 - Intro to Dimension Tables
08:43 - Create Dimension Tables
11:53 - Join to Create Marts
14:55 - Benefits & Future Topics

Title & Tags:
Data Modeling Tutorial: Star Schema (aka Kimball Approach)
#kahandatasolutions #dataengineering #datamodeling
Рекомендации по теме
Комментарии
Автор

I've read the whole "The data warehouse toolkit" book by Kimball twice and this video explains the most important part of the 500 page book as clear as possible.
Well done!

vrta
Автор

I would agree with joshi. It would be cool for a future video to see how take data from conformed layer then create a star schema using dbt.

apibarra
Автор

Please make a video like this on Conformed Dimensions please!

Also how do you handle the lack of primary key and foreign key constraints in Snowflake?

JimRohn-uc
Автор

This is a very clear explanation with a precise example of the star schema. Thanks a lot for the good video.

AkhmadMizkat
Автор

Thanks! Great and simple overview for someone who is pretty new to this. What I would recommend is to explain pitfalls like - how to load dimensions for values there to be unique, why you have to care about having unique values in dimensions, what will happen if you left join to the dimension which has duplicate records.

ruslandr
Автор

Thank you for the great and to the point presentation, especially this is great that you showed a simple live code to execute it since most tutorials just repeat the books and present the theory and don't show how to code it. It would be great though to cover SCD implementation in the dimension tables and what would happen to the natural keys when we will have to use surrogate keys for the dimension and fact tables, because in almost all real-world DWH examples there is always need for keeping the history of dimensions inside the dim tables, which of course the OLTP primary keys will not be applicable.

mojtabapeyrovi
Автор

Thanks for this video. I have a few questions:

1) in the star schema, we won't do any joins between the dimension tables right, but why did you create a table by joining all the dimension tables to flatten all the dimensions in a single table?
2) since we are creating the mart tables by joining with other tables, How these tables get refreshed because those are not views or materialized views?

varuntirupati
Автор

Brilliant video
So helpful in basic understanding of star schema design
Keep up the great work bro

sievpaobun
Автор

Typo on the chapter header #7. Great video tho on a hard to teach and under-covered topic. I strongly agree about the usability case for Star schemas even in a modern stack. It creates a highly ordered, easy to reason about, junction point between the chaos of sources, ingestion, data lakes, etc and the complexity of ~infinite data consumer use cases. The payoff in downstream development efficiency is huge.

AdamSmith-lgvn
Автор

Hi, love your channel, I'm learning a lot. What are your thoughts on Star-schema vs One Big Table (OBT)? Would you make a video comparing pros and cons of each other?

nlopedebarrios
Автор

Good in-depth data engineering video for professionals!

jpzhang
Автор

Great video. One recommendation would be to change the font color of the text that is commented out, it is nearly impossible to read.

wingnut
Автор

Wow..very beautifully explained. Loved it ❤
Makes me explore more of your content.
Thank you.

saheenahzan
Автор

Really great video!!!! Thank you!!! Hope to see new video concerning SCD, indexing!!!!

lingerhu
Автор

Such a cool data concept man! Thanks for introducing me to it. Cheers!

lucashoww
Автор

@Kahan Data Solutions, I wish you covered the concepts of Surrogate Keys with SCD Type 2, , while in this video you have conveniently skipped that and made it look like it is a simple task, , by joining multiple entities which Ralph Kimball strongly advocates to Avoid. I really want to see your approach for some of the most difficult questions, when there are many to many relationships in the real world.

mzzvukr
Автор

Great tutorial, Thanks for putting this together.

mehdinazari
Автор

Hi, could you kindly make a video on how to load the fact table incrementally? What if we have an OLTP system and the dimension tables get big really quickly

karangupta_DE
Автор

Hey, thanks for the video! It gives a good overview how to model a star schema. But how could new staging data be ingested in the tables of a star schema? For example, an easy but inefficient approach would be to create the tables on a daily basis. But to be more optimal, you would need a process to ingest new data into the tables. Do you have an idea how that could be done in modern warehouses like Snowflake? Or some resources on that? I think it would be helpful to add some technical columns to the raw data layer, like a load timestamp, to track the records that need to be ingested. Furthermore, a valid_to and valid_from timestamp in dimension tables could be added where changes can occur (changed address of a customer).

freshjulian
Автор

Not related to the topic.. But, @Kahan, what do you think about Meltano? Will you add it to your Modern Data Stack?

kalyanben