Improving Python and Spark Performance and Interoperability with Apache Arrow

preview_player
Показать описание
Apache Spark has become a popular and successful way for Python programming to parallelize and scale up data processing. In many use cases though, a PySpark job can perform worse than an equivalent job written in Scala. It is also costly to push and pull data between the user’s Python environment and the Spark master. With Julien Le Dem and Ji Lin

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:
Рекомендации по теме