Matei Zaharia, Stanford University Composable Parallel Processing in Apache Spark and Weld

preview_player
Показать описание
Giving every developer easy access to modern, massively parallel hardware, whether at the scale of a datacenter or a single modern server, remains a daunting challenge. In this talk, I’ll cover one powerful weapon we can use to meet this challenge: enabling efficient composition of parallel programs. Composition is arguably the main way developers are productive writing software, but unfortunately, it has taken a back seat in the design of many parallel processing APIs. For example, composing MapReduce jobs required writing data to files between each job, which was slow and error-prone, and many single-machine parallel libraries face similar problems.

I’ll show how composability enabled much higher productivity in the Apache Spark API, and how this idea has been taken much further in recent versions of Spark with “structured” APIs such as DataFrames and Spark SQL. In addition, I’ll discuss Weld, a research project at Stanford that aims to enable much more efficient composition between parallel libraries on a single server (either for the CPU and GPU). We show that the traditional way of composing libraries in this setting, through function calls that exchange data through memory, can create order-of-magnitude slowdowns. In contrast, Weld can transparently speed up applications using libraries such as NumPy, Pandas and TensorFlow by up to 30x through a novel API that lets it optimize across the library calls used in each program.
Рекомендации по теме