Databases Vs Data Warehouses Vs Data Lakes - What Is The Difference And Why Should You Care?

preview_player
Показать описание
Recently I was helping a client with a project because their MongoDB instance wasn't able to handle the queries they needed.

I explained that one of the major issues is that MongoDB wasn't built for complex analytical queries. Both in terms of its structure and its query language.

So I suggested they look into OLAP. But they weren't sure what that was, so I decided to make a video about databases vs data warehouses vs data lakes.

If you enjoyed this video, check out some of my other top videos.

Top Courses To Become A Data Engineer In 2022

What Is The Modern Data Stack - Intro To Data Infrastructure Part 1

If you would like to learn more about data engineering, then check out Googles GCP certificate

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

Or check out my blog

And if you want to support the channel, then you can become a paid member of my newsletter

Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
Рекомендации по теме
Комментарии
Автор

Hey, really appreciate this video. If I could summarize, it sounds like:

- (transactional) databases are generally closer to the data generation source and tend to be closer to operations
- data warehouses are further downstream of the transactional databases and have usually gone through some pre-processing to make it more accessible for downstream usage (ie: analytics, machine learning, etc.)
- data lakes are kind of a catch all storage method for your data that may require a little more technical knowledge and effort to access

wilsonman
Автор

Great video to compare the differences among the 3 types and their general use cases; it is very helpful to help me identify which type I'm dealing with on my job. Their definitions have always been debatable because their use cases vary a lot by how companies define them for their projects.

sngx
Автор

Nice video! One thing that I noticed is none of the content creators (relates to data science) have been talking about technologies like Druid or Clickhouse.

Im a telecom engineer and radio access network data is massive, we use Clickhouse to save performance counters and Presto+S3 for taking network configurations snapshots. Teams for other countries use druid, really nice tools not so mentioned here on youtube

yves.dantas
Автор

7:05 might be time to mention Dr. Ralph Kimball’s contributions to dimensional data warehouse design.

MilhouseBS
Автор

Awesome video. I am prepping for an interview for my dream job and this helped me so much. Thank you!

endpermia
Автор

data warehouses represent a centralized location for storing data assets from various other sources where the centralization allows data experts to answer business and analytics questions with a 360 view of data that the company has. Often the underlying format of the data is based on the analytical engine of the warehouse chosen. Whether your warehouse is row-based or columnar or just files is decision made by the engine responsible for handling load/insert/query operations. You can have a warehouse that doesn't leverage star schema or snowflake design and still call it a warehouse albeit probably not one that is efficient to analyze.

arahso
Автор

I always thought 'database' was just an umbrella term for referring to any storage thing which stores data, whether its a relational, non-relational, object, etc. type database.

BJTangerine
Автор

You made it super easy, thanks heaps!

MahmoudAziz
Автор

Very interesting guide... Was stuck on a decision earlier on what approach to take but I guess my uncertainty was a result of the evolving use cases and requirements.... Awesome explanation here💯

oyindamolavictor
Автор

Boy I love the way you say Seattle data guy

malikmudassarawan
Автор

Nice video, might be useful to show examples of each at the end.

muzahmad
Автор

9:06 a well designed star schema aka dimensional model is quite easy to add new facts or dimensions. Opposite of rigid, if designed with shared dimensions in mind. See Kimball.

MilhouseBS
Автор

i would appreciate it if u talk in much slower rate to be able to catch these valuable information, I tried to put the video sppeed on 0.75

mahmoudfadaly
Автор

What is the advantage of snapshots in a data warehouse instead of just saving a copy of the database each period?
Also, you can use these separate copies for analytics without interfering with the transaction DB version.

kaischmid
Автор

Thank you for this great content.
How to reach out if I have other questions?
I just got certified data warehouse engineer, so, I'm new to this but I have a good knowledge of the whole concept.

bantuandproud
Автор

what's your opinion on Databricks?

garynico
Автор

So if you have a lot of document journals that you need to like archived but accessible for read access. Would you recommend a wear house instead of a lake?

freddiepalmgren
Автор

Hey Ben! when you say row oriented data warehouse, it caught my attention and I tried to look it up on google but did not get any satisfactory results. Could you elaborate on this term? what are the use cases these address? Why do they exist in the first place?

AnishBhola
Автор

Can you tells how you switch from data analyst to data engineering in your 2 years of being a data analyst, what did you expose your self first into, is it going to be mastering python and SQL then etl?
Thank you

jhonnafg
Автор

Thanks, can you become a Data Warehouse engineer without learning programming? I just want to learn SQL

poizentv