How to Stream Data using Apache Kafka & Debezium from Postgres | Real Time ETL | ETL | Part 2

preview_player
Показать описание
In this video we will set up database streaming from Postgres database to Apache Kafka. In the previous session we installed Apache Kafka, Debezium and the rest of the required components, and configured the Postgres database for data streaming. Today we will configure a the Database and Debezium Connector and start Streaming data from our Postgres database.


#apachekafka #DataStreaming #etl

💥Subscribe to our channel:

📌 Links
-----------------------------------------
#️⃣ Follow me on social media! #️⃣

-----------------------------------------

Topics covered in this video:
0:00 - Introduction Apache Kafka, Debezium and requirements
0:29 - Postgres Table Creation
2:22 - Python Data Insert Script
3:13 - Kafka Connect API Client
3:41- VS Code API Client Install
5:03 - Postgres Kafka Connector
7:38 - Kafka Topics & Insert row for topic creation
8:53 - Stream Data to Kafka
Рекомендации по теме
Комментарии
Автор

Hello! As a newbie data engineer, I've found your videos to be incredibly helpful. The way you explain concepts makes it easy for me to grasp and apply them in my work. Thank you for sharing your knowledge and helping me on my learning journey!

Looking forward the your next videos

sunsas
Автор

Hello! Thank you for the amazing content which briefly explain data streaming with CDC, and I just have a quick question regarding the location in container where debezium store all configuration made when setting up a connector. I am asking this for the purpose of knowing how someone can persist a connection for later usage even when the container stop. Thanks

edisonngizwenayo
Автор

Hello! Just found your amazing channel and enjoying it a lot. I have a question about subject. I reproduced your setup and it works just fine for inserts and updates. But I noticed that on delete no message is produced to kafka topic. Any tips on how to fix this? In any case thank you for your content!

andriifadieiev
Автор

How to handle pipeline disruptions. Can you provide some insights for the below referred points?
1. There seems to be known limitation with PostgreSQL database that transactions that are already read by CDC replication task can’t be reprocessed even when the task is restarted from old LSN Value.
2. Also it appears, the task cant be moved between the replicate servers without coordinating with the PostgreSQL DBA on updating the pg_hba.conf file. Can we create a script to overcome this or any better alternatives.

chald
Автор

Hi,
Is the connector name and topic name always same? Can you name your ropic something else? If you want to have multiple topic for 1 connector then it will be helpful. Thanks in advance.

aniketrele
Автор

Can you make a video showing ETL using Kafka to extract, PySpark to process, and upload to S3, Can you use Airflow to manage?

hungnguyenthanh
Автор

Hi. I love your videos. I have been trying this project for months now, but still getting "connecting to my Ip address refused ". please how can I solve this problem? I am stucked here for months now

timiayoade
Автор

Hello Sir, how do i get entire CDC
like insert, delete, updates?

ayocs
Автор

Hi. Newbie here. I am encountering this error ModuleNotFoundError: No module named 'kafka.vendor.six.moves' when I tried to run something via jupyter. Any suggestion how to fix this?

rmntr
Автор

it is possible please do the videos on flink and use language scala

thejasreddy
Автор

How can i do the same with Amazon Dynamo db, can you please make video on this.

technicalking
Автор

How about deletes with this technic and setup?

jootuubanen
Автор

Why you did not created a table from select command

macetesdev
Автор

Hello, I'm currently encountering the error keyerror: 'PGPASS'. Please I would love to know how to resolve this

paulaganbi
Автор

hi, can you give me the file to import the data tables like in the video

hungnguyenthanh
welcome to shbcf.ru