filmov
tv
Extending Machine Learning Algorithms with PySpark

Показать описание
Machine learning practitioners are most comfortable using high-level programming languages such as Python. This is a barrier to parallelizing algorithms with big data frameworks such as Apache Spark, which are written in lower-level languages. Databricks partnered with the Regeneron Genetics Center to create the Glow library for population-scale genomics data storage and analytics. Glow V1.0.0 includes PySpark-based implementations for both existing and novel machine learning algorithms. We will discuss how leveraging tooling for Python users, especially Pandas UDFs, accelerated our development velocity and impacted our algorithms’ computational performance.
Connect with us:
Connect with us: