[Tech Talk] Enhancing Apache Spark for robust data processing

Показать описание

In this session, we present our research on Apache Spark, an open-sourced distributed framework for large-scale data analytics and AI workloads. The main goal of our research is to achieve a faster execution speed and fault-tolerance in such applications by enhancing the memory management capability of the system. For this, we have investigated a chronic memory issue in Spark and developed our advanced memory management scheme for it, which will be explained during our session. To be more specific, we will introduce our solution, the lineage-checkpoint approach, which we have developed to solve the long-lineage problem in Spark.

#Samsung, #SDC21, #DataProcessing

Samsung Developer
#Samsung
#SDC21
#DataProcessing

Рекомендации по теме

[Tech Talk] Enhancing Apache Spark for robust data processing

[Tech Talk] Enhancing Apache Spark for robust data processing

Big Data Processing with Apache Spark(John Ramirez, Tech Talk @ CodeDay Labs 2021)

Tech Talk: Top Tuning Tips for Spark 3.0 and Delta Lake on Databricks

Project Zen: Improving Apache Spark for Python Users

Sigmoid Tech Talk - How to optimize nested queries using Apache Spark

Optimizing Apache Spark SQL at LinkedIn

Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Methods with Luca Canali

Fine Tuning and Enhancing Performance of Apache Spark Jobs

Apache Spark at Scale: A 60 TB+ Production Use Case (Sital Kedia)

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East talk by DB Tsai

SQL Performance Improvements at a Glance in Apache Spark 3.0

Cluster Configuration in Apache Spark | Thumb rule fo optimal performance #interview #question

Understanding the Working of Apache Spark's Catalyst Optimizer in Improving the Query Performan...

Building Machine Learning Algorithms on Apache Spark - William Benton

Understanding Apache Spark Architecture | Common Big Data Interview Questions #interview

How Apache Spark 3 0 and Delta Lake Enhances Data Lake Reliability

What EXACTLY is Kubernetes?! #tech #coding #techeducation

Riot Games: Improving the Gaming Experience With Apache Spark

Seattle Spark + AI Meetup: How Apache Spark™ 3.0 and Delta Lake Enhance Data Lake Reliability

Apache Spark as a Platform for Powerful Custom Analytics Data Pipeline: Talk by Mikhail Chernetsov

Advantages of PARQUET FILE FORMAT in Apache Spark | Data Engineer Interview Questions #interview

Luxoft Tech Talk with Martin Toshev - Building highly-scalable data pipelines with Apache Spark

Apache Spark Optimization with @priyachauhan813 . Check the full video #apachespark

Understanding Apache Spark's Adaptive Query Execution - AQE| Spark Optimization Strategy #inter...