Spark Autotuning: Spark Summit East talk by: Lawrence Spracklen

Показать описание

While the performance delivered by Spark has enabled data scientists to undertake sophisticated analyses on big and complex data in actionable timeframes, too often, the process of manually configuring the underlying Spark jobs (including the number and size of the executors) can be a significant and time consuming undertaking. Not only it does this configuration process typically rely heavily on repeated trial-and-error, it necessitates that data scientists have a low-level understanding of Spark and detailed cluster sizing information. At Alpine Data we have been working to eliminate this requirement, and develop algorithms that can be used to automatically tune Spark jobs with minimal user involvement,
In this presentation, we discuss the algorithms we have developed and illustrate how they leverage information about the size of the data being analyzed, the analytical operations being used in the flow, the cluster size, configuration and real-time utilization, to automatically determine the optimal Spark job configuration for peak performance.

Рекомендации по теме

Spark Autotuning: Spark Summit East talk by: Lawrence Spracklen

Spark Autotuning: Spark Summit East talk by: Lawrence Spracklen

Tuning and Monitoring Deep Learning on Apache Spark: Spark Summit East talk by Tim Hunter

Spark and Online Analytics: Spark Summit East talk by Shubham Chopra

Optimizing Apache Spark SQL Joins: Spark Summit East talk by Vida Ha

Spark and Object Stores —What You Need to Know: Spark Summit East talk by Steve Loughran

Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang

New Directions in pySpark for Time Series Analysis: Spark Summit East talk by David Palaitis

Spark Tuning for Enterprise System Administrators

Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan Li

SparkLint: a Tool for Monitoring, Identifying and Tuning Inefficient Spark Jobs (Simon Whitear)

Apache Spark Meet Up at Spark Summit East 2017

GoDaddy Small Business Success Index Using Apache Spark: Spark Summit East talk by Baburao Kamble

SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal

Understanding Spark tuning with autotuning or magical spells to stop your pager going off at 2am

Spark Summit Bay Area Apache Spark Meetup @ Moscone Center SF

Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shivaram Venkataraman

Introducing Apache Spark 3.0 | Matei Zaharia and Brooke Wenig | Keynote Spark + AI Summit 2020

Metrics-driven tuning of Apache Spark at scale

PDC 2019- Taming the Elephant: Hadoop/Spark Auto Tuning

Sparklens: Understanding the Scalability Limits of Spark Applications with Rohit Karlupia (Qubole)

MLlib Advanced Training - Spark Summit 2014

Understanding Memory Management In Spark For Fun And Profit

Lambda Processing for Near Real Time Search Indexing at WalmartLabs: talk by Snehal Nagmote

Spark SQL 2 0 Experiences Using TPC DS (Berni Schiefer)