A friendly introduction to PySpark MLlib (and a taste of MLFlow) [Virtual]

Показать описание

A friendly introduction to PySpark MLlib (and a taste of MLFlow)
by Michelle Hoogenhout

Doing data science at scale? PySpark and MLlib bring the power of Spark's distributed processing to python users so that you can train machine learning models on massive datasets. MLlib provides tools for data extraction, transformation and loading, common ML algorithms, and model evaluation. And with the addition of MLFlow, it's easier than ever to log, reproduce and deploy your ML models. This walkthrough is aimed at those new to MLflow, and will take you through the ML lifecycle with PySpark's ML toolset.

Bio:
Michelle Hoogenhout is a data scientist with a background in cognitive neuroscience and experimental design. She is a senior data science and analytics instructor at Galvanize and co-founder of Ingane Health, a data science consulting firm. Michelle holds a PhD (Psychology) from the University of Cape Town, South Africa and has published on topics such as statistics and data management, data science training methods, ethics, and cognitive and physiological assessment.

=================
Agenda (Pacific Daylight Time, UTC -07)
=================
- 12:00 - 12:15 pm -- Gathering and introductions
- 12:15 - 1:15 pm -- Talk
- 1:15 - 1:30 pm -- Q & A, discussion

=================
Questions?
=================
Join our slack channel or leave a comment below if you have any questions about the group or need clarification on anything.