Koalas: Pandas API on Apache Spark - PyCon SG 2019

preview_player
Показать описание
Speaker: Ben Sadeghi, Solutions Architect, Databricks

Pandas is the de facto standard (single-node) DataFrame implementation in Python, while Spark is the de facto standard for big data processing. With the recently open-sourced Koalas package, you can be immediately productive with Spark, with no learning curve, if you are already familiar with pandas, and have a single codebase that works both with pandas (tests, smaller datasets) and with Spark (distributed datasets). In this talk, we'll go through the basics of Koalas, along with demos.

About the speaker:

Produced by Engineers.SG
Рекомендации по теме