Matei Zaharia, Stanford University Composable Parallel Processing in Apache Spark and Weld

Показать описание

Giving every developer easy access to modern, massively parallel hardware, whether at the scale of a datacenter or a single modern server, remains a daunting challenge. In this talk, I’ll cover one powerful weapon we can use to meet this challenge: enabling efficient composition of parallel programs. Composition is arguably the main way developers are productive writing software, but unfortunately, it has taken a back seat in the design of many parallel processing APIs. For example, composing MapReduce jobs required writing data to files between each job, which was slow and error-prone, and many single-machine parallel libraries face similar problems.

I’ll show how composability enabled much higher productivity in the Apache Spark API, and how this idea has been taken much further in recent versions of Spark with “structured” APIs such as DataFrames and Spark SQL. In addition, I’ll discuss Weld, a research project at Stanford that aims to enable much more efficient composition between parallel libraries on a single server (either for the CPU and GPU). We show that the traditional way of composing libraries in this setting, through function calls that exchange data through memory, can create order-of-magnitude slowdowns. In contrast, Weld can transparently speed up applications using libraries such as NumPy, Pandas and TensorFlow by up to 30x through a novel API that lets it optimize across the library calls used in each program.

uwaterloo

Рекомендации по теме

Matei Zaharia, Stanford University Composable Parallel Processing in Apache Spark and Weld

Matei Zaharia, Stanford University Composable Parallel Processing in Apache Spark and Weld

Virtual Keynote by Prof. Matei Zaharia (Stanford)

scale.bythebay.io: Matei Zaharia, Composable Parallel Processing in Apache Spark and Weld

Composable Parallel Processing in Apache Spark and Weld by Matei Zaharia | Databricks

Stanford MLSys Seminar Episode 2: Matei Zaharia

Matei Zaharia: DAWN: Infrastructure for Usable Machine Learning

The Future of Big Data - Matei Zaharia (MIT)

funconf 2013, Matei Zaharia: Spark: Big Data Analytics Made Fast and Easy

Data Science in 30 Minutes: Infrastructure for Usable Machine Learning with Matei Zaharia

Weld: Accelerating Data Science by 100x | Stanford University

Deep Learning and Streaming in Apache Spark 2 x - Matei Zaharia & Sue Ann Hong

Parallel Programming with Spark (Part 1 & 2) - Matei Zaharia

The Birth and Growth of Spark: An Open Source Success Story // Matei Zaharia // MLOps Podcast #155

MIA: Matei Zaharia, Scaling analysis with Apache Spark; Tim Poterba, Jon Bloom, Distributed compute

Matei Zaharia, Databricks - Spark Roadmap PT 1 | 2016 Innovation Day: Databricks

Matei Zaharia | Spark Summit 2017

Spark and MLflow Similarities // Matei Zaharia // MLOps Podcast #155 short clip

Scaled: Matei Zaharia: Sparking the Data Revolution

07 Three Ways Spark Went Against Conventional Wisdom Matei Zaharia

Introduction to AmpLab Spark Internals

Matei Zaharia (Databricks) - Apache Spark Meetup

Apache Spark Creator Matei Zaharia Interview

Introduction to Spark 2.0 (Part 2)

Matei Zaharia on the future of Spark and MLflow