Extending Spark Machine Learning Beyond Linear Regression by Holden Karau

Показать описание

Abstract:
Apache Spark is one of the most popular Big Data tools. This talk introduces & takes a deep dive on Spark's scikit-learn inspired Machine Learning pipelines.

Spark's ML pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren't available yet. By integrating your own data preparation and machine learning tools into Spark's ML pipelines you will be able to take advantage of useful meta-algorithms, like parameter searching.

Even if you don't have your own machine learning algorithms you want to implement, this talk peels back the covers on how the ML APIs are built and can help you make even more awesome ML pipelines and customize Spark models for your needs.

A basic understanding of Spark will make it easier to follow along, but if this is your first Spark talk, this will still be useful and give you a broad understanding of how Spark ML functions (of course since the presenter is an author, if this is your first introduction to Spark she encourages you to buy her book "Learning Spark" & "High Performance Spark").