'Systems that run forever self-heal and scale' by Joe Armstrong (2013)

preview_player
Показать описание
How can we build large self-healing scalable systems?

In this talk I will outline the architectural principles needed for building scalable fault-tolerant systems. I'll talk about building systems from small isolated parallel components which communicate though well-defined protocols.

Programs will have errors in them and will fail so I'll talk about detecting and correcting errors at run-time. Programs will evolve with time, so I'll talk about how they be changed while they are running. I'll talk about Erlang and how it relates to these architectural principles.

Joe Armstrong is one of the inventors of Erlang. When at the Ericsson computer science lab in 1986, he was part of the team who designed and implemented the first version of Erlang. He has written several Erlang books including Programming Erlang Software for a Concurrent World. Joe held the first ever Erlang course and has taught Erlang to hundreds of programmers and held many lectures and keynotes describing the technology.

Joe has a PhD in computer science from the Royal Institute of Technology in Stockholm, Sweden and is an expert in the construction of fault tolerant systems. Joe was the chief software architect of the project which produced the Erlang OTP system. He has worked as an entrepreneur in one of the first Erlang startups (Bluetail) and has worked for 30 years in industry and research.
Рекомендации по теме
Комментарии
Автор

I upvoted this before seeing it because there is no such thing as a bad Joe Armstrong talk.

wuschelthepuschel
Автор

I never get tired of reviewing Joe's past talks. I didn't know him personally and I didn't always agree with him on some non-technical opinions but his disappearance was something I felt personally.

AFerreiraV
Автор

I loved Joe's The Mess We're In talk and have rewatched it multiple times over the years and am looking forward to watching this. Thanks for uploading all the older content recently!

uqqwfkd
Автор

I had the first enlightenment in my way of grasping CS at the moment of the end of reading Joe Armstrong's thesis.

linkernick
Автор

I love the all sequential programming languages get error handling wrong, this would better explain the communication barrier admins face reporting structural errors to devs than any other explanation ever thought of.

udirt
Автор

"Scaling down" sounds good, if you know what system you're building before you start. If you don't, which is always the case, you're basically building your best guess of what the system should be and then iterating and changing it. Doing this with an already scaled up system seems unnecessarily complex to me. You first need to find your problems, fix them, and then worry about scaling those fixes up.

Автор

Makes me want to learn Elixir... Any starter project ideas?

AlexRodriguez-gbez
Автор

Whoah. He called the WhatsApp acquisition perfectly! 1:02:00

owenimholte
Автор

@39:37 - You still need backups. It doesn't matter how reliable your storage is if the data gets corrupted (intentionally or unintentionally) and then gets propagated through out your network. Too late to realize this after your long running system has accumulated month, years, or decades of data, which is now useless due to it being invalidated. Imagine in today's world having someone hack into such a banking system and change everyone's transaction/balance history. Even if you you could detect the data was altered, you still wouldn't know what it is suppose to be.

He does mention using snapshots.. but seemingly in the context of short-term restarting, not "I can go back 2 years and do a full data audit", which long term [protected] backups can provide.

triularity
Автор

He glossed over runtime upgrades… are they really working smoothly?
Any pointer for further study would be greatly appreciated.

aum
Автор

Distributed computing is easy… until you stop assuming every actor in the system is honest.

SneedsFeeduckAndSeeduck
Автор

There seems to be an error at 33:30 where it’s written “World is concurrent“, shouldn’t that be “parallel”?
Concurrency is when processes are interleaved on a single executor, parallel is when they run on independent executor, or not?

aum
Автор

Machine B does not know that it is a replica. Machine A does not know it is not a replica.

kurbads
Автор

I wonder what he'd do with Rust...

Luredreier