'Workflows, a new abstraction for distributed systems' by Dominik Tornow (Strange Loop 2022)

preview_player
Показать описание
For the past 45 years, the database systems community has enjoyed an unparalleled developer experience: Database Transactions mitigate challenges such as failure on a platform level, entirely eliminating these challenges on an applications level.

Unfortunately, the distributed systems community has not enjoyed a similar developer experience: There was no equivalent abstraction that mitigates challenges like failure on a platform level.

However, many companies, including Snap, Uber, and Netflix, are adopting a new paradigm: Workflows. Workflows are to distributed systems what transactions are to databases.

This talk explores how Workflow Systems mitigate challenges on a platform level and provide a developer experience for distributed systems that rivals the developer experience for databases, allowing you to literally code as if failure does not even exist!

Dominik Tornow
Temporal, Principal Engineer
@DominikTornow

Dominik Tornow is a Principal Engineer at Temporal. He focuses on systems modeling, specifically conceptual and formal modeling, to support the design and documentation of complex software systems.

------ Sponsored by: ------

Рекомендации по теме
Комментарии
Автор

A very insightful and useful talk! Thanks for it, Dominik!

GrigorySapunov
Автор

Great talk and a compelling approach for distributed systems. Also, whenever you slipped into Arnold Schwarzenegger voice, I had no choice but to agree with anything you were saying.

joshgraham
Автор

This is not a complete solution. There is no explanation for how the runtime can be certain that a remote job has actually completed. This was the hard problem and it's still not resolved. It's just moved to a different layer of abstraction within the orchestrator. Coroutines within the orchestration language give no such guarantees of exactly once execution on a remote system. The runtime is just a layer within the orchestrator, and it still doesn't know how to distinguish between a request never being received vs the job completing but the response never being received, plus there is no way to know if the remote system is idempotent. Somehow the remote system needs to be made aware of the coroutine scope to interoperate with it. This is the interesting problem to solve and I don't see any explanation for it.

megamaser
Автор

Fascinating talk. I would love to have a list of the reading references he gave, and/or a textual presentation. I felt lost in some of the technical details when following along the presentation

linerider
Автор

Interesting talk but it seemed to imply that any failure could be resolved by retrying which is not always the case. You can retry as many times as you like but if some idiot has dropped the table (yes, I've had that happen) it's never going to work.

karlfimm
Автор

Yeah, nice ideas but it's too complex to be even practiced in 99.5% of services and systems. Another part that I didn't enjoyed is a very theoretical aspect of the talk. For any of such systems you would have to employ an extremely disciplined way of implementing any business logic with so many boundaries and conventions that we will effectively spend most effort on keeping the "mechanism working" rather than implementing the actual business logic. It is so hard to establish and maintain approach cohesion this kind of convention on a small team level, let alone medium to large size organization.

MisFakapek
Автор

Isn't it better to implement such things on top of actor model than on a language-specific co-routines / logs ?

warever