Is My Data Drifting? Early Monitoring for Machine Learning Models in Production | PyData Global 2021

preview_player
Показать описание
Is My Data Drifting? Early Monitoring for Machine Learning Models in Production.
Speaker: Emeli Dral

Summary
Machine learning models can degrade with time. This is often due to the change in input data or real-world patterns. It is critical to monitor the model performance in production. But it is not always possible to evaluate the model quality if the ground truth labels are not available. In this talk, we will present how one can monitor data and prediction drift as a proxy for performance decay.

Description
Machine learning models can degrade with time. Often, this is due to the change in input data and/or the relationship between the features and the target. It is important to keep an eye on model relevance and intervene in time if something goes wrong.

But it is not always possible to directly evaluate the model quality in production since you don't always have the ground truth labels or actual values. In this case, detecting a change in the input data distributions and model predictions might serve as an early warning of the expected model decay.

In this talk, we will explore how one can evaluate data drift using statistical tests, visualize it and interpret the results.

Emeli Dral's Bio
Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing open-source tools to analyze and monitor the performance of machine learning models.

Earlier, she co-founded an industrial AI startup and served as the Chief Data Scientist at Yandex Data Factory. She led over 50 applied ML projects for various industries - from banking to manufacturing. Emeli is a data science lecturer at GSOM SpBU and Harbour.Space University. She is a co-author of the Machine Learning and Data Analysis curriculum at Coursera with over 100,000 students. She also co-founded Data Mining in Action, the largest open data science course in Russia.

PyData Global 2021

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Рекомендации по теме