'Building a Distributed Task Scheduler With Akka, Kafka, and Cassandra' by David van Geest

preview_player
Показать описание
Dynamically scheduled tasks are at the heart of PagerDuty's microservices. They deliver incident alerts, on-call notifications, and manage myriad administrative chores. Historically, these tasks were scheduled and run using an in-house library built on Cassandra, but that solution had begun to show its age.

Early in 2016, the Core team at PagerDuty built a new Task Scheduler using Akka, Kafka, and Cassandra. After six weeks in development, the Scheduler is now running in production. This talk discusses how the strengths of the three technologies were leveraged to solve the challenges of resilient, distributed task scheduling.

This talk will present a number of distributed system concepts in the real-world context of the Scheduler project. How can you dynamically adjust for increased task load with zero downtime? Can you guarantee task ordering across many servers? Do your tasks still run when an entire datacenter goes down? What happens if your tasks are scheduled twice? Attendees can expect to see how all of these challenges were addressed.

Some familiarity with distributed queueing and actor systems will be helpful for attendees of this talk.
Рекомендации по теме
Комментарии
Автор

Well explained but one thing seems to be missing here is how scheduling is implemented. It'd be nice to have that explained as well.

LovyGupta
Автор

Very good talk. Well-prepared and well-presented.

enjoyalife
Автор

Few questions regarding The Old Solution:
1) What was the reason for choosing Cassandra as the database ?
2) Why was WorkQueue slow ? Because of too many I/O operations ?

Few questions regarding The New Solution:
1)You said we cannot scale once we have one partition per broker and increasing the partitions is costly. So what is the alternative ? What if i want to scale further ?
2)When you say logical queues will be stuck, you mean to say logical queues in the stuck logical queue instance will be stuck right ? and not all the logical queues across the service instances right ?

suhasnayak
Автор

I need to implement same in java. Is there any place I can get the architecture documents of this ?

santhoshkumarp