Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline!

preview_player
Показать описание
Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again! Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform, providing low-latency pub-sub messaging coupled with native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, which is part of Apache Kafka. ksqlDB is the event streaming database for Apache Kafka, and makes it possible to build stream processing applications at scale, written using a familiar SQL interface.

In this talk, we’ll explain the architectural reasoning for Apache Kafka and the benefits of real-time integration, and we’ll build a streaming data pipeline using nothing but our bare hands, Kafka Connect, and ksqlDB.

Gasp as we filter events in real-time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection!

--

🎓 Resources

--

☁️ Confluent Cloud ☁️
Рекомендации по теме
Комментарии
Автор

Thank you so much.
I have seen a lot of videos & books already and this is the first time I understand and see all the strength and ease of kafka.
Great work !!!

maxtudor
Автор

This is really an awesome walkthrough .. Thank you!

abdulelahaljeffery
Автор

Why every one using conflunet kafka thsi and that, I wanted to do it in production and confluent kafka is not open source.
Can anyone suggest any article or video to refer, I want to load csv or json file to kafka as a table.

shibilpm
Автор

Great talk and walk-through. I am very new to Confluent platform and ksqlDB seems to be a great thing. I have one question about Confluent connect (following your example of looking up users details from MySQL store): how big that remote MySQL table can be and/or whether or not it is important at all if the join happens on the PK? My sense that it does not matter that much, am I correct?

ashchedrin
Автор

what happens to the streams or tables if in case the ksqkdb or kafka connect cluster crashed ?
if I restart the docker where im running the ksqldb streams or kafka connect will the streams starts from the where they left off ?
are these any instance where you had too many streams and half of the crashes, how do you recover ?

rbb
Автор

I am using JDBC Connectors and receive `Key format: ¯\_(ツ)_/¯ - no data processed` although I have set `"key.converter": in my connector. I dow see the full stream with key value null: `rowtime: 2021/05/31 08:33:38.411 Z, key: <null>, value: {"id": 10, ...`. Do you have an idea what could went wrong / what typical issues may come up in that point?

janga
Автор

I do have some questions:
Kafka is using a key value storage for messages and have some great features for data persistency. But usual data streams should not be stored forever - right? I guess Kafka has an internal cleanup policy for removing old streams, specially if topics come to maximum of physical sizes.

How does ksqlDB handles that kind of cleanup policy (if it's exists?) since we are using it for database purposes it should be available for lifetime.
So my question: What is ksqlDB? Is it a kafka topic consumed always from offset earliest, is it comparable to a Redis key value storage, is it comparable to a MongoDB document storage, is it comparable to sql databases, ... ?

janga