MLOps20: Overcoming DataOps Hurdles to Get ML Models into Production

preview_player
Показать описание
Now that you have used sample data and proved your ML model is feasible, it is time to plan for production deployment. 85% of data science projects never get to production! The raw data required for the model needs to be collected from source datastores, persisted in data lake, wrangled to eliminate outliers/missing values, filtered to ensure data rights governance, joined and transformed with other datasets, optimized for SLAs, checked for quality after transformations, and so on. These steps are implemented either as a batch or real-time data pipeline that generates feature values used for both model training and inference. Implementing these pipelines in production (referred to as DataOps) is non-trivial and involves weeks and months of to-and-fro between data scientists and data engineers. This talk provides a framework for breaking down DataOps hurdles into 18 metrics. For a subset of these metrics, we cover experiences in automating these metrics and make them self-service for data users to radically simplify ML production deployments.
Speaker: Dr. Sandeep Uttamchandani is the Chief Data Officer and VP of Product Engineering at Unravel Data Systems. He brings nearly two decades of experience building enterprise data products as well as running petabyte-scale data platforms for business-critical analytics and ML applications. Most recently he was at Intuit, where he ran the data platform team powering analytics and ML for Intuit's financial accounting, payroll, and payments products. Previously in his career, Sandeep was co-founder and CEO of a startup using ML for managing security vulnerabilities of open-source products. He has played engineering leadership roles at VMware and IBM for 15+ years
Рекомендации по теме