New Developments in the Open Source Ecosystem: Apache Spark 3 0, Delta Lake, and Koalas

preview_player
Показать описание
In this talk, we will highlight major efforts happening in the Spark ecosystem. In particular, we will dive into the details of adaptive and static query optimizations in Spark 3.0 to make Spark easier to use and faster to run. We will also demonstrate how new features in Koalas, an open source library that provides Pandas-like API on top of Spark, helps data scientists gain insights from their data quicker.

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:
Рекомендации по теме
Комментарии
Автор

Very nice. Things will be more exciting using spark with all these new features. One question though, on an existing installation running spark2, How easy is the upgrade?

alaham
Автор

Deltalake & koalas going to be the game changer in the field data analytics

chinmayabarik
Автор

Where is `plot` function at 17:05 coming from? Does "Apache" Spark natively support displaying dataframes in "Jupyter" notebooks?

NitinPasumarthy
Автор

Can we get the partition pruning demo or video here in YouTube?

AnirbanNagDev
Автор

when you want to scroll down the Jupyter file in video 30:00 and you end up scrolling YouTube page

MPXVM
Автор

Do I need to install Koalas on every node of cluster or just on the master?

yuanji
Автор

22:57 can someone give some details about that “forecast=true”

sahihe
Автор

6:28.. I wonder if the audience clapped because of 2X or being able to save a few lines of codes..

vincenttan
Автор

Invite me next time I'll initiate loud applause on right moments, just feel like it's missing on such a presentation :)

UkrozaVR
Автор

nice! when will koalas be available for R?

dreznik
Автор

People applauding optimizations that SQL had already introduced 25 years ago. Really kids don't study databases anymore?

albertoandreotti