Data Modeling Tutorial: Star Schema (aka Kimball Approach)

Показать описание

It's hard to last as a data engineer without understanding basic data modeling.

In this video we'll cover the basics of one of the most common approaches: Star Schema data modeling.

...Aka Kimball modeling.

...Aka Dimensional modeling.

We'll discuss the high level concepts, I'll show how to build one from scratch and we'll end with a review of the benefits & future topics to explore.

Thank you for watching!

Timestamps:
00:00 - Intro
00:31 - High-level Overview
01:35 - Intro to Fact Tables
03:56 - Create a Fact Table
07:25 - Intro to Dimension Tables
08:43 - Create Dimension Tables
11:53 - Join to Create Marts
14:55 - Benefits & Future Topics

Title & Tags:
Data Modeling Tutorial: Star Schema (aka Kimball Approach)
#kahandatasolutions #dataengineering #datamodeling

Рекомендации по теме

Комментарии

I've read the whole "The data warehouse toolkit" book by Kimball twice and this video explains the most important part of the 500 page book as clear as possible.
Well done!

vrta

I would agree with joshi. It would be cool for a future video to see how take data from conformed layer then create a star schema using dbt.

apibarra

Please make a video like this on Conformed Dimensions please!

Also how do you handle the lack of primary key and foreign key constraints in Snowflake?

JimRohn-uc

This is a very clear explanation with a precise example of the star schema. Thanks a lot for the good video.

AkhmadMizkat

Thanks! Great and simple overview for someone who is pretty new to this. What I would recommend is to explain pitfalls like - how to load dimensions for values there to be unique, why you have to care about having unique values in dimensions, what will happen if you left join to the dimension which has duplicate records.

ruslandr

Thank you for the great and to the point presentation, especially this is great that you showed a simple live code to execute it since most tutorials just repeat the books and present the theory and don't show how to code it. It would be great though to cover SCD implementation in the dimension tables and what would happen to the natural keys when we will have to use surrogate keys for the dimension and fact tables, because in almost all real-world DWH examples there is always need for keeping the history of dimensions inside the dim tables, which of course the OLTP primary keys will not be applicable.

mojtabapeyrovi

Thanks for this video. I have a few questions:

1) in the star schema, we won't do any joins between the dimension tables right, but why did you create a table by joining all the dimension tables to flatten all the dimensions in a single table?
2) since we are creating the mart tables by joining with other tables, How these tables get refreshed because those are not views or materialized views?

varuntirupati

Brilliant video
So helpful in basic understanding of star schema design
Keep up the great work bro

sievpaobun

Typo on the chapter header #7. Great video tho on a hard to teach and under-covered topic. I strongly agree about the usability case for Star schemas even in a modern stack. It creates a highly ordered, easy to reason about, junction point between the chaos of sources, ingestion, data lakes, etc and the complexity of ~infinite data consumer use cases. The payoff in downstream development efficiency is huge.

AdamSmith-lgvn

Hi, love your channel, I'm learning a lot. What are your thoughts on Star-schema vs One Big Table (OBT)? Would you make a video comparing pros and cons of each other?

nlopedebarrios

Good in-depth data engineering video for professionals!

jpzhang

Great video. One recommendation would be to change the font color of the text that is commented out, it is nearly impossible to read.

wingnut

Wow..very beautifully explained. Loved it ❤
Makes me explore more of your content.
Thank you.

saheenahzan

Really great video!!!! Thank you!!! Hope to see new video concerning SCD, indexing!!!!

lingerhu

Such a cool data concept man! Thanks for introducing me to it. Cheers!

lucashoww

@Kahan Data Solutions, I wish you covered the concepts of Surrogate Keys with SCD Type 2, , while in this video you have conveniently skipped that and made it look like it is a simple task, , by joining multiple entities which Ralph Kimball strongly advocates to Avoid. I really want to see your approach for some of the most difficult questions, when there are many to many relationships in the real world.

mzzvukr

Great tutorial, Thanks for putting this together.

mehdinazari

Hi, could you kindly make a video on how to load the fact table incrementally? What if we have an OLTP system and the dimension tables get big really quickly

karangupta_DE

Hey, thanks for the video! It gives a good overview how to model a star schema. But how could new staging data be ingested in the tables of a star schema? For example, an easy but inefficient approach would be to create the tables on a daily basis. But to be more optimal, you would need a process to ingest new data into the tables. Do you have an idea how that could be done in modern warehouses like Snowflake? Or some resources on that? I think it would be helpful to add some technical columns to the raw data layer, like a load timestamp, to track the records that need to be ingested. Furthermore, a valid_to and valid_from timestamp in dimension tables could be added where changes can occur (changed address of a customer).

freshjulian

Not related to the topic.. But, @Kahan, what do you think about Meltano? Will you add it to your Modern Data Stack?

kalyanben

Data Modeling Tutorial: Star Schema (aka Kimball Approach)

Data Modeling Tutorial: Star Schema (aka Kimball Approach)

DIMENSIONAL DATA MODELING TUTORIALS - STAR SCHEMA | Based on a case study | Part 1

What is STAR schema | Star vs Snowflake Schema | Fact vs Dimension Table

Data Modeling (Star Schema 🌟) in Power BI – Creating Dimension Tables

How to setup a Star Schema Data Model in Power BI - Easy guide

Comparing 3 Types of Data Modeling (Normalized vs Star Schema vs Data Vault)

Basic Star Schema design

Master Dimensional Modeling Lesson 01 - Why Use a Dimensional Model?

Dimensional Modeling

Data Modeling - Walking Through How To Data Model As A Data Engineer - Dimensional Modeling 101

DIMENSIONAL DATA MODELING TUTORIALS - STAR SCHEMA | Based on a case study | Part 2

Data Modeling in the Modern Data Stack

Fact & Dimension tables in Power BI | Data Modelling

Data modeling best practices - Part 1 - in Power BI and Analysis Services

Star Schema vs Snowflake Schema I Data Modelling I Data Warehousing I Schema I SQL I Power BI

Data Modeling Basics | #Tableau Course #32

Data Modelling: From Single Table To Star Schema - Chris Barber

Power BI Data Modelling Tutorial 1 - Star Schema?

Dimensional Data Model Tutorial - A Kimball Style Data Model

Why Power BI loves a Star Schema

Data Model Prep - STAR schema Basics - Season 5 Ep.1

Power BI Star schema from one large table

Don't Make These DATA MODELING MISTAKES | Data Modeling Best Practices in Power BI | Goodly

Header-detail vs star schema models in Tabular and Power BI