Koalas: Making an Easy Transition from Pandas to Apache Spark

preview_player
Показать описание
Koalas is an open-source project that aims at bridging the gap between big data and small data for data scientists and at simplifying Apache Spark for people who are already familiar with pandas library in Python. Pandas is the standard tool for data science and it is typically the first step to explore and manipulate a data set, but pandas does not scale well to big data. With Koalas, data scientist can use the same APIs as pandas’ but at scale with PySpark. In this talk, I introduce Koalas and its updates, and also show some comparisons between pandas and Koalas, then deep-dive into its internal structures and how it works with Spark.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us:
Рекомендации по теме