Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

preview_player
Показать описание

The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.

Why Use Apache Flink?

The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.

Why use ksqlDB/Kafka Streams?

Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.

There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter.

Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.

EPISODE LINKS

TIMESTAMPS
0:00 - Intro
2:06 - The world of stream processing
6:26 - Flink vs ksqlDB
18:34 - Example use case
20:03 - SQL was built for static data
25:51 - Concept of event time
29:30 - Session-based window joins
35:47 - Processing streaming data with SQL
39:47 - Scaling Kafka Streams/ksqlDB
45:39 - Exactly-once semantics
48:15 - Choosing stream processing tools
53:52 - It's a wrap

ABOUT CONFLUENT

#streamprocessing #ksqldb #apachekafka #kafka #confluent
Рекомендации по теме
Комментарии
Автор

The dude in the middle is brilliant! Asks the correct questions for the uninitiated!

benjinguyen
Автор

This was an excellent episode. KrisJ - I really like your host/interviewing style. This was an interesting topic and very well presented.

FnordFandango
Автор

Oh, liked this one! For Kafka Streams/ksqlDB *everything* is about Kafka, all input and all output moves through 1 single Kafka cluster. That has bit me a few times, and Flink is more flexible there: You can read from one cluster and write to another. Or join data from different clusters. Or read data from a cluster you only have read access from.

flyaruu
Автор

Kstreams is my favourite simply because of the deployment model as long as I already have a Kafka cluster. If the echo system does not use Kafka and uses AWS Kinesis, I would choose Flink.

rogers
Автор

OMG, it really worked. Thank you so much!!

affaffofa
Автор

Where do you persist those states? How easy to share that states when you move Kubernetes from one cluster to a new cluster ? Currently, I persist states in Redis.

jdang
Автор

what about latency ? is it as performing as others available on the market if not better at fraction of the cost ? would your provide some benchmark numbers relative to other candidate streaming languages / frameworks
targeted use case : streaming large financial datasets in many formats, text, integer, float ...etc your input is highly appreciated

mikiallen
Автор

13:41 the big takeaway as to why/when Flink vs Kstreams

AP-ehgr
Автор

Flink is not as advanced a product as you present it. It is more like libraries and scripts for creating software than software itself. In flink you cannot do many trivial things that normally do with data. Flink also changes drastically from version to version and is not compatible with the previous ones. The documentation is unclear. Flink disappointed me a lot.

podunkman
Автор

Using SQL syntax in streaming application makes things even worse. How do you test Ksql together with Kafka Streams? They just belong two different worlds.
The idea of enabling not java developer to work with Kafka will failed at the end. If someone can't even write Java code, he is definitively not qualified for developing or handling the complicity in such streaming applications.

Автор

well, tNice tutorials is going to take forever...

pawar