'The Magical Rebalance Protocol of Apache Kafka' by Gwen Shapira

preview_player
Показать описание
Very few people know that inside's Apache Kafka's binary protocol for publishing and retrieving messages hides another protocol - a generic, extensible protocol for managing work assignments between multiple instances of a client application.

When multiple Kafka consumers in the same consumer group subscribe to a set of topic partitions, Kafka knows how to assign a subset of topic partitions to each consumer and how to handle failover automatically. What is less known is that this assignment is determined by the consumer client itself and that the same protocol can be used by any application for both leader election and task assignment.

In this session we'll dive into the internals of this little-known assignment protocol -- the binary network protocol and the Java APIs. We'll look in detail at how Kafka Consumers, Connect and Streams API use this protocol for task management. And finally we'll show how you too can extend this protocol to implement task assignment in your application with an algorithm of your choice - even if it doesn't use Kafka for anything else.

Speaker: Gwen Shapira
Рекомендации по теме
Комментарии
Автор

@12:28 she talks about a use case in which @ 12:35 she says need of having consumer groups of size 100s of consumers.
The justification goes : "If there is a fail over kafka-cluster, and it's really big and you have to copy everything from one kafka-cluster to another in real time you need a large consumer group to do it because you're consuming every single partition in your kafka-cluster in a single group."

- Because you don't want to waste time. And the Kafka-System must be strong enough to support it quickly, before the fail over the current kafka-cluster eats it all up and you've all your data gone.

premktiw