Data Modeling Where Theory Meets Reality - How Different Companies I Worked At Modeled Their Data

preview_player
Показать описание
Data modeling varies at different companies.

At facebook we had plenty of storage and often treated historical data modeling very differently compared to when I worked at an enterprise.

The concept of slowly changing dimensions wasn't as prevalent and instead we simply stored snapshots of data every day.

So let's talk about modeling historical data and how it varied.

If you enjoyed this video, check out some of my other top videos.

Top Courses To Become A Data Engineer

What Is The Modern Data Stack - Intro To Data Infrastructure Part 1

If you would like to learn more about data engineering, then check out Googles GCP certificate

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

Or check out my blog

And if you want to support the channel, then you can become a paid member of my newsletter

Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
Рекомендации по теме
Комментарии
Автор

Thank you for this . Using this for my analytics engineer interview and data modeling .

karszn
Автор

Speaking from the analyst side of things, I was totally happy to create derived views/tables to be able to get historical data accurate if the data engineers could provide a change-record table with a certain minimal set of fields - the value (City in the earlier example), the relevant record (Customer in this case), and the date that value started applying.

So if you don't have time to create something cleaner, providing at least those will give analysts enough to figure the rest out. Thanks for talking about this!

Lunarisage
Автор

Would be great to hear you talk more about data architecture design! Also would be nice to hear your experience around the ‘build or buy’ dilemma within data teams.

ivanooo
Автор

I was just trying to figure this out for a DE portfolio project regarding an employees table, thank you!!

tech-n-data
Автор

It's so great to see this explained. I worked with an engineering team who did daily partitions but only for a week or two AND month end snapshots ... I had no idea what those were until I had to learn to query their data for reports lol

But we didn't have a macro, we used where dt = current date - 2 (also on hive) cause we knew our latest refresh had a 2 day lag always

And we may use subqueries to create each "latest" table and join those together

Also, curious to see your thoughts on audit tables? I did application support and they had a robust history of every change made to every field and who changed it

sergioramos
Автор

very cool. when I started out scds were king. every team I've worked in for the last 3 years has gone the "partition every day route". I'm still not sure I like it.

alexanderpotts
Автор

Thank you! Great and very helpful video as always.
I have a question regarding the SCDT2 explanation at 7:30. Don't you have to set the end date of the first record of John to 2023-12-31?
Because in your example, it seems it is overlapping if you query for the active records for 2024-01-01. You are going to have John's records twice in your result set on that day.

zoltantakats
Автор

Is there a playlist somewhere? Trying to figure out where I'm at in this series lol

sergioramos
Автор

Interesting use case about SCD2 but how in practice do we create these tables? I understand the importance and how useful is it to have a new row for each change but can’t get how to model it to make it work

glstnlev
Автор

Did you ever get access to Palantir Foundry to see how they model data?

reddixiecrat
Автор

wait.. if you capture the datacreated you do have the date when it change city, you dont really need more information, just need to query all datacreated where name=john and you will have a diagram of john

piripitflaustik
Автор

Where do I find the theory behind the "types" of dimension tables?

otavioattuy
Автор

Sir is your household income 150k dollars a year? Plz reply. Thanks a lot.

gourabsarker