Logging Machine Learning Data Why Statistical Profiling is the Key to Data Observability at Scale

Показать описание

Speaker's Bio:

Bernease Herman, Data Scientist / Researcher, WhyLabs

Bernease Herman is a data scientist at WhyLabs, the AI Observability company, and a research scientist at the University of Washington eScience Institute. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon and conducted quantitative research at Morgan Stanley. Her academic research focuses on evaluation metrics and interpretable ML with specialty on synthetic data and societal implications. Bernease serves as faculty for the University of Washington Master’s Program in Data Science program and as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series. She has published work in top machine learning conferences and workshops such as NeurIPS, ICLR, and FAccT. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

Abstract:

The day the ML application is deployed to production and begins facing the real world is the best and the worst day in the life of the model builder. Debugging, troubleshooting & monitoring takes over the majority of their day, leaving little time for model building. In DevOps, software operations are taken to a level of an art. Sophisticated tools enable engineers to quickly identify and resolve issues, continuously improving software robustness. In the data science and machine learning worlds, operations are still largely a manual process that involves Jupyter notebooks and shell scripts. One of the cornerstones of the DevOps toolchain is logging, it’s the foundation of testing and monitoring tools. What does logging look like in an ML system?

In this talk, we will show you how to enable data logging for a machine learning application using an open source library, whylogs. We will discuss how something so simple enables testing, monitoring and debugging of the entire data pipeline. We will dive deeper into key properties of a logging library we've built that can handle TBs of data, run with a constraint memory footprint and produce statistically accurate log profiles of structured and unstructured data. Attendees will leave the talk equipped with best practices to log and understand their data and supercharge their MLOps.

Toronto Machine Learning Series (TMLS)

Рекомендации по теме

Logging Machine Learning Data Why Statistical Profiling is the Key to Data Observability at Scale

What is logarithm? | Math, Statistics for data science, machine learning

Log normal distribution | Math, Statistics for data science, machine learning

Machine Learning for Log Analysis

Machine Learning for Log Analysis Explained by @dankornas

What is Log loss in machine learning|| How to calculate log loss in ML?

Function Transformer | Log Transform | Reciprocal Transform | Square Root Transform

Free Well Logging & Petrophysics Datasets for Data Science and Machine Learning

133 - What are Loss functions in machine learning?

Supervised Learning: Classification in Machine Learning | AIML End-to-End Session 44

Machine Learning Application for Well Logging and Petrophysical Interpretation Prelude Class

Discussing All The Types Of Feature Transformation In Machine Learning

Using Machine Learning on K8s Logs to Find Root Cause Faster

📚3 In-Depth Machine Learning Books You Can't Miss! #machinelearning #datascience #shorts

Use This Way Of Training Machine Learning Models For Efficiency

Machine Learning for k8s Logs and Metrics

Find Outliers in Your Log Data with Machine Learning

P- 2 Machine Learning Application For Well Logging & Petrophysical Interpretation

Maximum Likelihood, clearly explained!!!

Parse Logs Faster with Machine Learning

How to use machine learning to identify log spikes & patterns for PostgreSQL logs with Elastic A...

Webinar: Log Analysis with Machine Learning to Find Root Cause Faster

Logarithms, Explained - Steve Kelly

Elastic Machine Learning Tips and Tricks - Categorization

Random Forest Regression Machine Learning - Well Log Prediction for Petrophysics