Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Показать описание

The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.

Why Use Apache Flink?

The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.

Why use ksqlDB/Kafka Streams?

Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.

There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter.

Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.

EPISODE LINKS

TIMESTAMPS
0:00 - Intro
2:06 - The world of stream processing
6:26 - Flink vs ksqlDB
18:34 - Example use case
20:03 - SQL was built for static data
25:51 - Concept of event time
29:30 - Session-based window joins
35:47 - Processing streaming data with SQL
39:47 - Scaling Kafka Streams/ksqlDB
45:39 - Exactly-once semantics
48:15 - Choosing stream processing tools
53:52 - It's a wrap

ABOUT CONFLUENT

#streamprocessing #ksqldb #apachekafka #kafka #confluent

Рекомендации по теме

Комментарии

The dude in the middle is brilliant! Asks the correct questions for the uninitiated!

benjinguyen

This was an excellent episode. KrisJ - I really like your host/interviewing style. This was an interesting topic and very well presented.

FnordFandango

Oh, liked this one! For Kafka Streams/ksqlDB *everything* is about Kafka, all input and all output moves through 1 single Kafka cluster. That has bit me a few times, and Flink is more flexible there: You can read from one cluster and write to another. Or join data from different clusters. Or read data from a cluster you only have read access from.

flyaruu

Kstreams is my favourite simply because of the deployment model as long as I already have a Kafka cluster. If the echo system does not use Kafka and uses AWS Kinesis, I would choose Flink.

rogers

OMG, it really worked. Thank you so much!!

affaffofa

Where do you persist those states? How easy to share that states when you move Kubernetes from one cluster to a new cluster ? Currently, I persist states in Redis.

jdang

what about latency ? is it as performing as others available on the market if not better at fraction of the cost ? would your provide some benchmark numbers relative to other candidate streaming languages / frameworks
targeted use case : streaming large financial datasets in many formats, text, integer, float ...etc your input is highly appreciated

mikiallen

13:41 the big takeaway as to why/when Flink vs Kstreams

AP-ehgr

Flink is not as advanced a product as you present it. It is more like libraries and scripts for creating software than software itself. In flink you cannot do many trivial things that normally do with data. Flink also changes drastically from version to version and is not compatible with the previous ones. The documentation is unclear. Flink disappointed me a lot.

podunkman

Using SQL syntax in streaming application makes things even worse. How do you test Ksql together with Kafka Streams? They just belong two different worlds.
The idea of enabling not java developer to work with Kafka will failed at the end. If someone can't even write Java code, he is definitively not qualified for developing or handling the complicity in such streaming applications.

well, tNice tutorials is going to take forever...

pawar

Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Apache Kafka Vs. Apache Flink

04 ksqlDB vs kafka streams framework

Using Kafka with Flink | Apache Flink 101

Kafka vs Kafka Streams comparison | Kafka introduction | Kafka Explained | Kafka Stream Explained

What is Apache Flink? #softwareengineering

Apache Flink - A Must-Have For Your Streams | Systems Design Interview 0 to 1 With Ex-Google SWE

Intro to Stream Processing with Apache Flink | Apache Flink 101

Apache Kafka 101: Kafka Streams (2023)

Consume Apache Kafka Messages using Apache Flink and Java

How to start learning Apache Kafka and Flink!

Kafka Streams + SQL = KSQLDB!! What is it And Tutorial! Build Applications Fast!

Inside ksqlDB: Introduction to ksqlDB's Architecture

Apache Kafka and Flink: Stateful Streaming Data Pipelines made easy with SQL

Stream Processing with Apache Flink on CDP

SPONSORED Interactive Session: Optimising data streaming pipelines on Flink and Kafka

Apache Spark vs Apache Flink: Which Is Better? (4 Key Differences You Should Know

Real-time analytics and anomaly detection with Apache Kafka, Apache Flink, Grafana & QuestDB

Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck

Explaining Stream Processing and Apache Kafka ft. Eugene Meidinger

Stream Processing with Apache Kafka, Samza, and Flink (December 2021)

High Scale, Distributed Stream Processing with Maximilian Michels - Flink, Kafka, Spark, Beam

KSQLDB vs KStreams

From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine