Sparklens: Understanding the Scalability Limits of Spark Applications with Rohit Karlupia (Qubole)

preview_player
Показать описание

From a single run of the application, Sparklens provides insights about scalability limits of given spark application. In this talk we will cover what Sparklens does and theory behind Sparklens. We will talk about how structure of spark application puts important constraints on its scalability. How can we find these structural constraints and how to use these constraints as a guide in solving performance and scalability problems of spark applications.

This talk will help audience in answering the following questions about their spark applications: 1) Will their application run faster with more executors? 2) How will cluster utilization change as number of executors change? 3) What is the absolute minimum time this application will take even if we give it infinite executors? 4) What is the expected wall clock time for the application when we fix the most important structural limits of these application? Sparklens makes the ROI of additional executor extremely obvious for a given application and needs just a single run of the application to determine how application with behave with different executor counts. Specifically, it will help managers take the correct side of the tradeoff between spending developer time optimising applications vs spending money on compute bills.

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:
Рекомендации по теме
Комментарии
Автор

where will the json file be, when this is run on databricks ?

avinashdevadhars