Smart City End to End Realtime Data Engineering Project | Get Hired as an AWS Data Engineer

preview_player
Показать описание
In this video, you will be building a Smart City End to End Realtime data streaming pipeline covering each phase from data ingestion to processing and finally storage. We'll utilize tools like IOT devices, Apache Zookeeper, Apache Kafka, Apache Spark, Docker, Python, AWS Cloud, AWS Glue, AWS Athena, AWS IAM, AWS Redshift and finally PowerBI to visualize data on Redshift.

Like this video?

Timestamps:
0:00 Introduction
1:29 System Architecture
7:22 Project Setup
9:00 Docker containers setup and coding
26:17 IOT services producer
38:19 Vehicle information Generator
48:10 GPS Information Generator
50:13 Traffic information Generator
53:13 Weather information Generator
58:35 Emergency Incident Generator
1:03:39 Producing IOT Data to Kafka
1:14:43 AWS S3 setup with policies
1:16:38 AWS IAM Roles and Credentials Management
1:19:14 Apache Spark Realtime Streaming from Kafka
2:01:14 Fixing Schema Issues in Apache Spark Structured Streaming
2:07:31 AWS Glue Crawlers
2:10:23 Working with AWS Athena
2:13:22 Loading Data into Redshift from AWS Glue Data Catalog
2:17:58 Connecting and Querying Redshift DW with DBeaver
2:20:51 Connecting Redshift to AWS Glue Catalog
2:23:34 Fixing IAM Permission issues with Redshift
2:26:05 Outro

🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟

🔗 Useful Links and Resources:

✨ Tags ✨
Data Engineering, Kafka, Apache Spark, Cassandra, PostgreSQL, Zookeeper, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming

✨ Hashtags ✨
#DataEngineering #Kafka #ApacheSpark #Cassandra #PostgreSQL #Docker #ETLPipeline #DataPipeline #StreamingData #RealTimeAnalytics
Рекомендации по теме
Комментарии
Автор

Please don't forget to LIKE and SUBSCRIBE! 🥺

CodeWithYu
Автор

You know you are such a gem, amazing quality paid work for free i will buy you a coffee onces i get a job brother. Keep this work on

easypeasy
Автор

This is the best data engineering content have since on youtube so far. Thanks for this.

ibrahimsalaudeen
Автор

Now beginning my Data Engineer journey and this tutorial is an absolute Gem! I was able to reproduce everything from A-Z and get it all running! Only glitch is the Broker service for some unknown reason always exits at some point so the vehicle never gets to the destination 😅. However I do still get the data on S3. Thanks again for this! Hope I can add this project to my portfolio. Looking forward to the visualisation part!

michaelokorie
Автор

Thank you so much Yusuf! After some challenges here and there I've been able to complete the project. As a newbie in data engineering, I've learned so much in this exercise and gained more confidence. Onto the next, which is spark unstructured streaming.

lineomatasane
Автор

Subbed! Thanks a lot for your kindness to share this amazing wisdom and knowledge!

Jerrel.A
Автор

Great Job Yu. Thanks for helping the humanity :)

pankajjaiswal
Автор

Thank you so much !! You are a good teacher.

SaiPhaniRam
Автор

I am getting an error at about 1:50:00 in the video:

ImportError: Pandas >= 1.0.5 must be installed; however, it was not found.

It turns out my spark-master doesn't have enough packages, including pandas and pyarrow. I tried pip installing all of them, and then the error changed to something else that doesn't make sense

Can anyone help point out what may have gone wrong?

aseessarkaria
Автор

Your tutorials are just amazing. Makes all of this stuff make sense. I would love to see one of those projects where you also use infrastructure as code with terraform for example. I know that’s more on the devops side but I had to do that at my first job as well as data engineering and was kinda lost for a while.

orlandobboy
Автор

always inspiring with handful content, keep up the good work

AnhNguyen-hjpd
Автор

Thank you for the great video. Can somebody help me where to find to copy at 13:50 Docker env variables

FAyt-ovuo
Автор

such a great project for free hatsoff to you man🥰

shujahtali
Автор

for setting up spark with docker, can you use envv variable SPARK_MODE=worker/ SPARK_MODE=master instead of the command line to create master worker containers instead?

Автор

Great Yusuf! Thanks a lot for another terrific contribution!

This is very helpful for me, as I want to implement a similar architecture for a project to driveschools here in Málaga.

Just wondering how could we simulate a non-straight route between 2 points? Maybe I could get a route record (lat long) and passing it to kafka by timestamp record one each?

I will replace the emergency topic for "paint points" where the students used to be suspended...

RafaVeraDataEng
Автор

was an amazing Tutorial! you are a badass! very needy this end to end projects! and yes can you do how to connect to power bi please thanks!

mrcrblr
Автор

Nice work..waiting for dbt and snowflake 🎉🎉😊

wiss
Автор

Thank you very much for this video, I learnt alot from it.

ataimebenson
Автор

Hello Mr Yu
So I'm following your tutorial but I'm running into issues around 1:12:23
When I try to run the whole code
Please there's no way to share my code on here but I followed you completely so I don't know if the error is from my system. The error is that I can't seem to access Kafka, I'm always getting an error

richardilemon
Автор

Thank you very much for all your project!
Could you please make a end to end project with delta live tables in databricks?

nikitabogatyrev