Fault tolerance in Distributed Systems: A Case Study in Apache Spark by Imran Rashid

preview_player
Показать описание
This video was recorded at Scala Days Chicago 2017

Abstract:
Apache Spark is a distributed computing platform built using Scala with fault-tolerance in mind. It has been tested rigorously and deployed in production at many companies for years. And yet, fault-tolerance issues are still surfaced. How did these faults slip through?

The purpose of this talk is not to examine Spark in detail, but rather to see what lessons can be learned for building your own systems, and what you should know as a user of one. We'll explore what a platform can reasonably guarantee, and what types of questions a user should be asking to understand their system. We'll see how Scala and FP principles are used in Spark; and also why Spark abandons some concepts, like immutability. All of these topics will be explored by looking at cases in Spark -- both how the code was designed initially, and issues that were discovered later and fixed.
Рекомендации по теме