How To Start A Data Engineering Project - With Data Engineering Project Ideas

preview_player
Показать описание
When you look to build a data engineering project there are a few key areas you should focus on.

- Multiple Types Of Data Sources(APIs, Webpages, CSVs, JSON, etc)
- Data Ingestion
- Storing Data
- Data Visualization (So you have something to show for your efforts).
- Cloud provider

Here is another list of project ideas if you need more inspiration

I also reference two projects in this video, here are links to them

If you enjoyed this video, check out some of my other top videos.

Top Courses To Become A Data Engineer In 2022

What Is The Modern Data Stack - Intro To Data Infrastructure Part 1

Also, consider checking out the GCP Data Engineering Certificate

Video Outline
0:00 Intro
0:53 What Data Engineering Tools Should You Use
3:44 What Data Sets Should You Use For Your Data Engineering Project
5:05 What Is Your End Goal
8:10 3 Data Engineering Project Ideas

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

Or check out my blog

And if you want to support the channel, then you can become a paid member of my newsletter

Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
Рекомендации по теме
Комментарии
Автор

Loved to see that Airflow was part of this recommendation! Thanks Ben!

LukeBarousse
Автор

Hey Ben, I love how you keep unleashing yourself and these new, more relaxed videos.

I'm also guilty for starting a project and then leaving there because of lack of time and a little bit of "what to do / where do I start".
These are very good insights that I'll be putting in practice for my next pet project.

Thanks!

nanomartin
Автор

As someone who started programming in the last year and got a job (internship) about 3 months ago. I'd like to thank you so much, you are a fundamental part of my life changing carrer.
It's so difficult to find content on my native language, I apreciate your videos man

ZoioGame
Автор

Love this video Ben.

Can't do a data engineering project rn due to school projects being due soon, but I'll def revisit this after my deadlines since you made it so accessible to start!

Rex_
Автор

Love the Data Engineering Project Ideas, thanks for sharing!

stifferdoroskevich
Автор

Can't express how thankful I am for this. Us burgeoning data engineers need more straightforward, pragmatic information like this, and your channel seems like the only place to find it. Been following you for a year and you never disappoint!

ericssonvancolborn
Автор

With so many tools most of the people aren't confident and procrastinate.
Same case with me but start with a simple idea and then keep on adding complexity :)

hamsansari
Автор

Amazing video, ben, thank you for preparing this great content!

SalzDaniel
Автор

Ben thanks for the video and projects recommendations.

josel
Автор

Awesome video - led me to to great resources. I really appreciate you for making this type of content!

daviscohen
Автор

Excellent, Ben, thanks for the share ❤️. I agree with the planning and execution part. I have planned many more projects and didn't finish them. And the mentioned real-estate project took me many years to complete 🙈. 20 Minutes is more than the reading time than any implementation. I was told I needed a catchier title therefore I added the 20 minutes 😉

sspaeti
Автор

Another good data source that I use a lot is this very platform! The YouTube Data API provides tons of opportunity

I've been working on projects to ingest YouTube comments, processing them with NLP and visualizing...

caseypdx
Автор

The last project was also recommended in someother video of your as well :D
I think DE projects for beginners

hamsansari
Автор

Amazing video! I want to create a data pipeline with open source tools running locally on my laptop. My intended stack is currently Dagster, MinIO, Clickhouse, dbt and Metabase.
Should I include a database engine like Postgres to load data in/clean the data before transforming it and put it in Clickhouse?

CuongNguyen-gufl
Автор

Hey Ben, been a mad fan of this channel for about 6 months now. Looking forward to new content on your channel have been a weekly routine.

Just wanted to get your insights on this: I have been working on tools like Informatica and Denodo in Data Engineering projects (client's requirements). How do you see these tools in the modern data engineering landscape in your opinion? How would you suggest myself to move forward with my career as a Data Engineering enthusiastic?

dataterre
Автор

Thanks for the amazing video Ben. Please suggest on how we can extract data from flightradar24, couldn't seem to find a free API available for the same. It would be very helpful to prepare the data set.

billodalroy
Автор

I'm writing a program that downloads all StackOverflow Developer Surveys. Because the surveys have changed so much throughout the years, I setout to answer four questions: What Operating systems are developers using, what languages, what Databases, and is the gender gap shrinking. I also wrote another script that takes two completely unrelated credit card datasets and trains a Neural Network on one and tests it on the other. The accuracy was random because one dataset is randomly generated. However, at one point(and I saved this dataset), I got a blind-test accuracy of 96%. Keep in mind, these datasets were unrelated and I had to run a Python script that performed an ETL process to get them to match. Oh, boy, was I proud.

theinquisitor
Автор

🚨 PROJECT VIDEO 🚨 Woop woop, cant wait

GiasoneP
Автор

Sir, should i use airflow with compute engine, or airflow in cloud composer ?

mauludinrohman
Автор

thank you for your video. i am beginner. i looking for data engineering toy project, searching job, practicing programming skill.

ygnkyqs
visit shbcf.ru