Project Zen: Making Data Science Easier in PySpark

preview_player
Показать описание
The number of PySpark users has increased dramatically, and Python has become one of the most commonly used languages in data science. In order to cater to the increasing number of Python users and improve Python usability in Apache Spark, Apache Spark initiated Project Zen named after “The Zen of Python” which defines the principles of Python.

Project Zen started with newly redesigned pandas UDFs and function APIs with Python type hints in Apache Spark 3.0. The Spark community has since then, introduced numerous improvements as part of Project Zen in Apache Spark 3.1 and the upcoming apache Spark 3.2 that includes:

Python type hints
New documentation
Conda, venv and PEX
numpydoc docstring
pandas APIs on Spark
Visualization

In this talk, we will present the improvements and features in Project Zen with demonstration to show how Project Zen makes data science easier with the improved usability.

Connect with us:
Рекомендации по теме
Комментарии
Автор

Thankyou for your great explanations :)

jadenpark
Автор

Really appreciate this video, but I had a hard time understanding anything. Even with CC on. Management can do a better job of making business decisions via better communication.

sndselecta