Comparing Kafka Streams, Akka Streams and Spark Streaming: what to use when | Rock the JVM

preview_player
Показать описание

This article is for the Java/Scala programmer who wants to decide which framework to use for the streaming part of a massive application, or simply wants to know the fundamental differences between them, just in case. I'm going to write Scala, but all the frameworks I'm going to describe also have Java APIs.

I'm going to discuss the main strengths and weaknesses of Akka Streams, Kafka Streams and Spark Streaming, and I'm going to give you a feel of how you would use them in a very simple word-counting application, which is one of the basic things to start with when one learns any distributed programming tool.

Contents:
0:00 intro
1:13 Kafka Streams
6:08 Akka Streams
11:31 Spark Streaming

Follow Rock the JVM on:

-------------------------------------------------------------------------
-------------------------------------------------------------------------
Рекомендации по теме
Комментарии
Автор

Just wanted to leave a note on how Reactive Manifesto and Reactive Streams are (not) related to each other. The first one describes 'reactive systems' - it means the whole system, where all of its components cooperate in a resilient, elastic, fault-tolerant and message-driven manner. So it is a specification of how a system should behave as a whole. Reactive streams, on the other hand, are just a piece of the puzzle in the reactive system. They also can be used separately, outside of reactive system. The thing is, you can actually write an application, which doesn't comply with requirements of Reactive Manifesto, but still uses and leverages Reactive Streams. 'Reactive' in systems means how the whole system reacts to volume, load, errors etc.; 'reactive' in streams means that you have a flow of data, and you react asynchronously to the events in this flow. In the world of Akka those two terms might get blurred, because Akka Actor system actually enables you to build a reactive system. Nonetheless, I would say that Akka Streams might help you build a reactive system, but they won't make your system resilient, elastic, etc. straight away.
Anyways, you have a really good content on this channel, thanks a ton for that!

marekiwaniuk
Автор

I really love how you lay out the pros and cons of each streaming API, and in what situation we have to use what. Really great stuff; and I'm glad that I found your channel.
I'd happily buy a membership to learn from your awesome courses.

Cheers pro :)

abdulelahaljeffery
Автор

One thing I missed was STATE, how they compare in terms of managing aggregations. Great video thank you.

carlosvaztec
Автор

A very nice high-level overview of the differences of the streaming libraries. I was especially looking for a description of when to use Kafka Streams instead of Akka Stream and this was very helpful. There was one severe error in your description of Akka Streams though. They are not "asynchronous by default". Most operators are actually synchronous and you are able to introduce asynchronous boundaries into streams or invoke asynchronous operations with a given degree of parallelism. Consecutive synchronous operations will be "baked" into a single actor transparently on materialization to minimize message passing overhead. So you have perfect and concise control over the concurrency of calculations. And I just can not fully agree on your position on Akka Streams as being especially hard for beginners. Especially programmers with some Scala experience will quickly relate to the collections-like API and be up and running in no time, especially compared to setting up Kafka or Spark. I think, before anyone approaches streaming libraries at all, they are probably already knee deep in hard to solve concurrency, dependency and performance problems and maybe sunk weeks into cracking each problem the hard way. Then finding Akka Streams you can finally concentrate on your logic, get all the boilerplate out of the way and write some self-descriptive concise code, that rocks some incredibly complex stuff, nicely modularized in readable code chunks that fit on a single screen. Its discovery for me was like finally coming home. I think, the hardest part is wrapping your head around the concept of materialized values, how to design stream stages with state correctly and when you need the Graph API at all. My next task is getting my hands dirty with Kafka.

ElectricWound
Автор

Nice, finally i know the difference and when to use what!!!! well done video as always

Dr_Dude
Автор

Good video, however it was nice if you could also include Flink (as you comparing streaming frameworks) it's generally 20% faster than Kafka Streams and Spark Streaming, probably Kafka streams is the future as Kafka's ecosystem is evolving, but syntax vice Spark/Flink are much more intuitive in Scala

iQwert
Автор

esti cel mai bun instructor de scala din lume :D
ce bine ca esti si pe udemy si ai si cursuri pe site.
tot asa Daniel!

alexandrutoma
Автор

Nice explanation. Can we also include a part of Apache Flink. Apache Flink, as i think, also uses Akka under the hood (?) and it also provides some good control over stream through low level APIs and other benefits as shown for akka.

ziauddin
Автор

Thanks for this detailed video. Can you please make similar video which compares Spark streaming with Apache Flink with Apache pulsar?

chandrashekharkotekar
Автор

Cool. But now (from 2.3) Spark has .trigger(processingTime = "0 seconds") to minimize the latency. We may use a 0 second processing time trigger indicating that Spark should start each micro-batch as fast as it can with no delays.

stanislavg.
Автор

Awesome video as always. I'd love a course (on udemy, not free!) of kafka/kafka streams. The other one on udemy are not as good as yours.

LucaSavoja
Автор

I am guessing ZIO streams is analogous to Akka streams w.r.t usage. right?

danishamjad
Автор

Hi Daniel,
Normally, how would you host the scala applications to make it long running process if you use Kafka Streams ?

I know if I use spark streaming, the dedicated cluster will keep it running and listen /react to the stream/data. I have not big amount of data.


Kind Regards

minshi
Автор

Could you please clarify what do you mean by fault tolerance in Akka Streams? I am used to working with big data frameworks (Kafka Streams, Spark Streaming and Flink) and they usually execute code on flock of machines with exceptional horizontal scalability and fault tolerance. I lack the information on Akka Streams side - from your description (best for high-performance streams that are part of the business logic) I would assume that we embed Akka Streams application into existing ones. That could give us superior vertical scalability (with concurrency backed by actors) but if that's just a single machine then how on earth can we talk about fault-tolerance? I must be missing something obvious :)

tai-hao-le
Автор

Hey, Daniel, I’m absolutely beginner and I have question about fs2 library which also using for some kind of streaming. My question is - could it be alternative for some of the streaming library’s that you mentioned in this video?

dimfatal
Автор

Is There any discount associate with your yearly full access membership? Here in Brazil things are complicated. Dollar is almost 6 times our currency.

Plsmferesi
Автор

but i hate jvm related technology.. so, do i have any other choices? or just suck it up?

_slier