Kafka Consumer Group is a Brilliant Design Choice and We should Discuss it

preview_player
Показать описание
Apache Kafka is an interesting software, every design decision the team make perfect sense. I decided to dive deep into discussion of the consumer group concept which is underrated and talk more about it.

0:00 Intro
1:24 Messaging Systems Explained
3:30 Partitioning
4:30 Pub/Sub vs Queue
6:55 Consumer Group
10:00 Parallelism in Consumer Group
10:30 Partition awareness in Consumer Group
11:30 Achieving Pub/Sub with Consumer Group
14:00 Head of Line blocking in Kafka


🎙️Listen to the Backend Engineering Podcast

🏭 Backend Engineering Videos

💾 Database Engineering Videos

🏰 Load Balancing and Proxies Videos

🏛️ Software Archtiecture Videos

📩 Messaging Systems

Become a Member

Support me on PayPal

Stay Awesome,
Hussein
Рекомендации по теме
Комментарии
Автор

Awesome insight...I remember seeing on Apache docs mentioning there cant be more consumers in a group than the no of partitions

sariksiddiqui
Автор

I arrive a bit late to the party, but here’s my take on it: Very interesting topic. I agree with what you said at the end, if one doesn’t know what technologies are being used in your program, you’ll probably will lack some optimization factors, and not just that but also the beauty of it.

Santiago-Torres
Автор

Hi Hussein, would really like in your future videos to see you refer how you used these technologies in your day to day job like the YouTube video processing examples in this case...keep up the good work

sariksiddiqui
Автор

I think one thing that a lot of people explaining Kafka and Consumers miss is the concept of Rebalancing. What is that?
Lets say we have 5 partitions and 2 consumers. Kafka internally will do the load balancing to give 3 partitions to 1st consumer and 2 partitions to the 2nd consumer. If another consumer joins the consumer group, Kafka will do rebalancing, where 2 partitions would be given to the 1st consumer, 2 partitions to the 2nd consumer, and 1 partition to the 3rd consumer.
This pretty much means that on the consumer side most of the time you should not care about the number of partitions and its assignment - because it is done automatically by Kafka. It you want to do Autoscaling (ECS Autoscaling in AWS or HPA in Kubernetes) - then you should care mostly about the topic lag (which means how long the message is spending time in the Kafka topic before being consumed). Make sure that the Autoscaling does not exceed the Number of Consumers to be bigger than the number of Partitions (because 1 Consumer consuming 1 Partition would mean idle/starved Consumer)

AleksandarT
Автор

13:18 The position of the partition ftom where is read is in the group itself

siddheshswami
Автор

awesome talk! I love kafka architecture

DevWonYoung
Автор

hey!👋🏻 I saw your comment on Sunny Lenarduzzis video and decided to pop by and check your channel out! Great content! Keep it up! It inspires me to see others crushing it!❤️

thatone_daniel
Автор

Amazing content. I thought people on the yt didn't knew what's software architecture hahaha I'm subbing to receive all your messages

nandomax
Автор

you are doing great work, thanks for quality content.

amitk.chaudhary
Автор

How would a commit work with partial failures? Like if 1 consumer groups process data point but another group fails on the same data point. Will retry mechanism trigger both consumer groups to restart the process on that data point?

sasankv
Автор

Whenever kafka rebalances partitions over brokers it becomes unavailable (CAP Theorem). That is one of the cons of using Kafka.

govindaraj
Автор

If you have more consumers than partitions then the remaining ones just sit idle. More formally, consumers are "assigned" to partitions; a partition cannot be assigned to more than one consumer; unassigned consumers sit idle. This is in contrast to a traditional message queue where a consumer can 'fetch' from any partition it wants.

There are great benefits to the approach that Kafka has, but I wish there was an error message that was returned by the broker, or by other consumers during a rebalance, when a consumer is unassigned. I've been bitten loads of times by idle consumers and poor throughput.

decoyslois
Автор

I asked another developer once, which equation can I use to convert EPOCH time to ISO8601 format? He replied: "Dude, just run npm install moment and forget the rest"...I'm glad I didn't ;)

pupdoggify
Автор

Unless the use cases are massive event streaming like used by original LinkedIn people, Kafka is overkill. I have yet to work on a project needing more than one Consumer group and no more than 4 partitions.

mefirst
Автор

Social distancing in computer science 😂

sharthakghosh
Автор

This youtuber is very good at making 2 mins content into 20mins video. 👎

alexanderlin