Benjamin Bengfort: Visual Diagnostics for More Effective Machine Learning | PyData Miami 2019

Показать описание

The model selection process is a search for the best combination of features, algorithm, and hyperparameters that maximize F1, R2, or silhouette scores after cross-validation. This view of machine learning often leads us toward automated processes such as grid searches and random walks. Although this approach allows us to try many combinations, we are often left wondering if we have actually succeeded.

By enhancing model selection with visual diagnostics, data scientists can inject human guidance to steer the search process. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the high dimensional realm that our models operate. As we continue to tune our models, trying to minimize both bias and variance, these glimpses allow us to be more strategic in our choices. The result is more effective modeling, speedier results, and a greater understanding of underlying processes.

Visualization is an integral part of the data science workflow, but visual diagnostics are directly tied to machine learning transformers and models. The Yellowbrick library extends the scikit-learn API providing a Visualizer object, an estimator that learns from data and produces a visualization as a result. In this talk, we will explore feature visualizers, visualizers for classification, clustering, and regression, as well as model analysis visualizers. We'll work through several examples and show how visual diagnostics steer model selection, making machine learning more effective.

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Рекомендации по теме

Benjamin Bengfort: Visual Diagnostics for More Effective Machine Learning | PyData Miami 2019

Benjamin Bengfort: Visual Diagnostics for More Effective Machine Learning | PyData Miami 2019

Visual Diagnostics for More Informed Machine Learning Within and Beyond Scikit-Learn - PyCon 2016

Interview - Ben Bengfort of District Data Labs

Benjamin Bengfort | Dynamics in Graph Analysis Adding Time as a Structure for Visual and Statistical

Understanding Machine Learning Through Visualizations with Benjamin Bengfort and Rebecca Bilbro

Tony Ojeda, Benjamin Bengfort, Laura Lorenz - Natural Language Processing with NLTK and Gensim

Visual Diagnostics at Scale: More Informed Machine Learning with Large Datasets | SciPy 2019 |Bilbro

Using Data Visualization To Improve Your Machine Learning Projects (Interview)

Tony Ojeda | Transforming Data to Unlock Its Latent Value

In Focus | Applications of Deep Learning

Building effective AI/ML Ops to constantly improve your machine learning models with Olivier Klien.

Building a Visualizer in Yellowbrick - Nathan Danielsen

Luke Starnes | Making Sense Out of Flight Test Data with Python

Sanne Vrijenhoek: Popping the Filter Bubble | PyData Amsterdam 2019

100 Year Celebration of Irvin & Edna Bengfort

Dante Gates: Model Evaluation for Humans | PyData Miami 2019

Leveling up your storytelling and visualization skills - Gerrit Gruben

Applying Reinforcement Learning in Industry

Tutorial: Friendly well report workflows with Natural Language Processing

Share Your Science: Mapping the Earth’s Interior with GPUs

The Value of Null Results - Angel D'az

Classifying Documents on a Graph using GNNs - Avi Aminov | PyData Global 2021

Certificate in Data Science - An Information Webinar

Ted Wilmes on the state of JanusGraph 2018