Ibis: Seamless Transition Between Pandas and Apache Spark

Показать описание

Pandas is the de facto standard (single-node) Data Frame implementation in Python. However, as data grows larger, pandas no longer works very well due to performance reasons. On the other hand, Spark has become a very popular choice for analyzing large dataset in the past few years. However, there is an API gap between pandas and Spark, and as a result, when users switch from pandas to Spark, they often need to rewrite their programs. Ibis is a library designed to bridge the gap between local execution (pandas) and cluster execution (BigQuery, Impala, etc). In this talk, we will introduce a Spark backend for ibis and demonstrate how users can go between pandas and Spark with the same code.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us: