Optimizing Apache Spark UDFs

preview_player
Показать описание
User Defined Functions is an important feature of Spark SQL which helps extend the language by adding custom constructs. UDFs are very useful for extending spark vocabulary but come with significant performance overhead. These are black boxes for Spark optimizer, blocking several helpful optimizations like WholeStageCodegen, Null optimization etc. They also come with a heavy processing cost associated with String functions requiring UTF-8 to UTF-16 conversions which slows down spark jobs and increases memory requirements. In this talk, we will go over how at Informatica we optimized UDFs to be as performant as Spark native functions both in terms of time and memory and allow these functions to participate in spark optimization steps.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:
Рекомендации по теме
Комментарии
Автор

Is this video only applicable for Scala or for Python? Shivani your video screen is not even properly visible

avnish.dixit_
Автор

Udf can return array when it comes to rowsbetweens??

shk
Автор

Can we use spark sql extension instead of replacing jars?

phamnguyen
Автор

The switch between slide view and camera view is too frequent .... had really hard time concentrating .

vs.