From Idea to Product: The Development of Apache Spark

preview_player
Показать описание
Matei Zaharia, discusses the development of Spark from an idea to a full-fledged product. He began working on Spark as a PhD student, inspired by data center-scale computing happening at web companies like Google and Yahoo. Spark initially focused on addressing two use cases: machine learning with integrated algorithms and interactive ad hoc ways. Over time, Spark expanded to include other large-scale batch processing capabilities. He emphasizes the importance of finding people who actually want to do large-scale computing and were interested in machine learning when building the early user base. Spark's growth and development provide valuable insights into the evolution of data center-scale computing and the power of open-source tools.

MLOps Coffee Sessions #155 with Matei Zaharia, The Birth and Growth of Spark: An Open Source Success Story, co-hosted by Vishnu Rachakonda.

// Abstract
We dive deep into the creation of Spark, with the creator himself - Matei Zaharia Chief technologist at Databricks. This episode also explores the development of Databricks' other open source home run ML Flow and the concept of "lake house ML". As a special treat Matei talked to us about the details of the "DSP" (Demonstrate Search Predict) project, which aims to enable building applications by combining LLMs and other text-returning systems.

// About the guest:
Matei has the unique advantage of being able to see different perspectives, having worked in both academia and the industry. He listens carefully to people's challenges and excitement about ML and uses this to come up with new ideas. As a member of Databricks, Matei also has the advantage of applying ML to Databricks' own internal practices. He is constantly asking the question "What's a better way to do this?"

// Bio
Matei Zaharia is an Associate Professor of Computer Science at Stanford and Chief Technologist at Databricks. He started the Apache Spark project during his Ph.D. at UC Berkeley, and co-developed other widely used open-source projects, including MLflow and Delta Lake, at Databricks. At Stanford, he works on distributed systems, NLP, and information retrieval, building programming models that can combine language models and external services to perform complex tasks. Matei’s research work was recognized through the 2014 ACM Doctoral Dissertation Award for the best Ph.D. dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).

// MLOps Jobs board

// MLOps Swag/Merch

// Related Links

--------------- ✌️Connect With Us ✌️ -------------
Follow us on Twitter: @mlopscommunity

Рекомендации по теме