Data Engineering Course for Beginners

preview_player
Показать описание
Learn the essentials of data engineering in this course for beginners. You'll learn about Databases, Docker, and analytical engineering. You'll explore advanced topics like data pipeline building with Airflow, and engage in batch processing with Spark and streaming data with Kafka. The course culminates in a comprehensive project, putting your skills to the test in creating a full end-to-end pipeline.

✏️ Justin Chau created this course.

Thanks to Airbyte for providing a grant to make this course possible.

⭐️ Contents ⭐️
⌨️ (0:00:00) Introduction
⌨️ (0:00:36) Why Data Engineering
⌨️ (0:03:14) Docker
⌨️ (0:30:38) SQL
⌨️ (1:04:32) Building a Data Pipeline from Scratch
⌨️ (1:31:03) dbt
⌨️ (2:04:11) CRON Job
⌨️ (2:07:54) Airflow
⌨️ (2:41:14) Airbyte
⌨️ (3:01:54) Outro

🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan
👾 Oscar Rahnama

--

Рекомендации по теме
Комментарии
Автор

We need full data engineering course py + sql + big data hadoop + apache spark + apache airflow + apache kafka + aws + project

xx-pnit
Автор

If you're running into an error with "exit code 1" @1:28:55, you need to update the Dockerfile @1:08:08. Goto the Dockerfile and "image: postgres:9.2" for both "source_postgres" and "destination_postgres"


I think this was a good idea but many details went off. I wish he'll be more specific about the version of the software he's using next time.

cisdolce
Автор

the best DE course that i ever seen. the most courses only stick to the theory and never show the practical part.

gustavosantos
Автор

Amazing!
Looking forward to it.
Also, it would be perfect to have a more comprehensive version of this course as well. Covering all batch and stream processing tools and methods.

Smplebserver
Автор

We need a 60 hrs course for DataEngineer.

himanish
Автор

Please bring a bigger course, which covers all aspects from basics, like from scratch, MySQL, python/Java/Scala, Hadoop Spark, pyspark, something that covers a data engineering with one cloud

JREQuickPods
Автор

Good job, you are creating visibility for airbyte in a great way, by providing an evolutionary view of the stack that gets one to eventually need it. Hope they continue to support you making content using this approach.

nandovanegas
Автор

1:59:58
the reason why he encountered the error is that {% generate_ratings() %}.

To avoid the error, you should put {% macro generate_ratings() %}

topgunlee
Автор

At 1:53:56 you may face an error due to the fact that the dbt service is launched before the completion of the elt_script service.

To solve the issue, you have to add condition: under the depends_on clause of the dbt service to be sure that it will always be launched after the completion of the elt_script service.

pepi
Автор

Hey, currently there is no set path to becoming a data engineer. So please create a proper certification with a clear roadmap of foundations and most used cloud tech in data engineering so that we can get some structure going for those interested in this career.

truthruster
Автор

Thanks for the course .. to all those using VM's make sure files are located on the VM .. tried running docker excercise with docker on VM and files on host (Windows) wasted a lot of time resloving errors finally moved all the files to the Ubuntu VM where things ran smoothly ...

HussainR
Автор

this course would really have benefited with explaining with a diagram what we were building. Yes, we're building a data pipeline, but it's good to give a high level overview of each component, what it does, and what goal we are accomplishing to begin with so we understand what we're building and why from the start.

MuslimBestLife
Автор

Please make this a series!
(Edit)
Suggestion 1: I would make is to include an overview before each section of 1) the overarching pipeline 2) where in the pipeline we are for a given section. In other words, having a map or diagram of what is happening would help with conceptual understanding.
Suggestion 2: Explain the code line-by-line conceptually. Time writing the code on the screen could be cut and replaced just by explanation. This would save time.

ethanvirtudazo
Автор

We would like to see more content for Data Engineering, possibly a full course.

IAmLorenzoF
Автор

Please add more data engineering course like this. I really love it.

gabrieltaka
Автор

Heads up for those getting into the 'Building a data pipeline from scratch' section, remember that building an image results in a permanent snapshot of the code. This means once you run that docker compose up, your elt_script service and thus the python script in the docker-compose is set in stone for that specific image and future changes to the python script require that you delete the old image so a new one can be built that reflects those new changes. Several of the persistent errors faced in the video are from making code changes but starting up a container based on the image that has an old version of the code. Rebuild the image each time you change code OR learn those bind mounts and then rebuild one time after you've finished developing.

MrHoboninjapirate
Автор

YES THANK YOU SO MUCH keep expanding this!!!!

kandoras.guzman
Автор

I just started and love the style. You teach fluent and set focus on the important take aways. I saw so much bullshit that I really expected that I need to watch you 10 minutes installing docker and already started skipping but you didn't show that part which is nice. Makes perfect sense. Someone not being able to RTFM and install docker on his own shouldn't focus on DE at this point anyway imho.

fuuman
Автор

Since when Justin is a data engineer? Well I guess the constant learning is real.

alexandrodisla
Автор

Here I'm I thinking about data engineering, then Boom! YouTube shows me a data Engineering course 😅

fezekile
visit shbcf.ru