filmov
tv
Deep Dive Into Apache Spark Multi User Performance - Mikhail Genkin, Peter Lankford, Michael Feiman
Показать описание
When you run an Apache Spark application on a large cluster, you want to make sure you’re getting the most from that cluster. Any CPU or memory left on the table represents either a waste of money or a lost opportunity to speed up your Spark jobs. What many people don’t realize is how sensitive Spark cluster utilization is to the resource manager. Resource managers decide how to allocate cluster resources among the many users and applications contending for them. In this deep dive session, we will discuss how Spark integrates with two common open source resource managers, YARN and Mesos, as well as a new commercial product called IBM Spectrum Conductor with Spark. You will learn how resource managers arbitrate resources in multi-user/multi-tenant Spark clusters, and how this affects application performance. You will come away with new techniques for tuning Spark resource management to optimize goals like speed and fairness. The session will include a demo of a new open source benchmark designed to help analyse Spark multi-user/multi-tenant performance. The benchmark uses Spark SQL and machine learning jobs to load the cluster, and can be used during a pre-production cycle to tune Spark and resource manager configurations.
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us:
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us: