Running Apache Spark jobs cheaper while maximizing performance - Brad Caffey, Expedia Group

Показать описание

Presented by Brad Caffey, Staff Big Data Engineer, Expedia Group

In a Covid-19 world, companies are looking for ways to reduce cloud spending as much as possible. While many Apache Spark tuning guides discuss how to get the best performance using Spark, none of them ever discuss that performance's cost. In this session, we'll cover a proven tuning technique for Apache Spark that lowers job costs on AWS while maximizing performance.

Topics include:

* the principle for how to make Apache Spark jobs cost-efficient
* how to determine the AWS costs for your Apache Spark job
* how to determine the most cost-efficient executor configuration for your cluster
* how to migrate your existing jobs to the cost-efficient executor
* how to improve performance with your cost-efficient executor

Qubole: The Cost-Efficient Data Lake

Рекомендации по теме

Комментарии

Great job, Brad. I've seen this info explained a lot of different ways but you did a nice job of explaining how to align resources to nodes.

loganboyd

Running Apache Spark jobs cheaper while maximizing performance - Brad Caffey, Expedia Group

Running Apache Spark jobs cheaper while maximizing performance - Brad Caffey, Expedia Group

Running Apache Spark on Kubernetes: Best Practices and Pitfalls

Right Tool for the Job: Running Apache Spark at Scale in the Cloud

Running a Low Cost, Versatile Data Management Ecosystem with Apache Spark at Core

What Is Apache Spark?

Testing Apache Spark Jobs in CI/CD - Week 3 Day 3 - DataExpert.io Free Boot camp

Using Apache Spark for Processing Trillions of Records Each Day | Datadog

Apache Spark? If only it worked. by Marcin Szymaniuk

The Hidden Life of Spark Jobs

Migrating Airflow-based Apache Spark jobs to Kubernetes – the Native Way

How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost

Scaling your Data Pipelines with Apache Spark on Kubernetes

How to Share State Across Multiple Apache Spark Jobs using Apache Ignite - Akmal Chaudhri

Open Source Reliability for Data Lake with Apache Spark

Big Data Processing with Apache Spark

Apache Spark: Cluster Computing with Working Sets

Better Together: Fast Data with Apache Spark and Apache Ignite

Best Practices for Running Efficient Apache Spark™ Workloads on Databricks

Spline: Apache Spark Lineage, Not Only for the Banking Industry - Jan Scherbaum & Marek Novotny

#09 | Snowpark Vs. Apache Spark | Will Spark Survive?

Managing Cost & Resources Usage for Spark

Portable Spark Runner: Running Beam Pipelines Written in Python and Go with Spark

Using Apache Spark in the Cloud—A Devops Perspective - Telmo Oliveira

How to stream big data with Data Accelerator for Apache Spark | Azure Friday