How Intuit uses Apache Spark to Monitor In-Production Machine Learning Models at Large-Scale

Показать описание

The presentation introduces Intuit AI Model Monitoring Service (MMS). MMS is an in-house Spark-based solution developed by Intuit AI to provide ongoing monitoring for both data (statistics of model input/output etc.) and model metrics (precision, recall, AUC etc.) of in-production ML models. The project is soon to be open-source. MMS aims to tackle multiple challenges of in-production ML model monitoring:

1.Integration of multiple data sources from different time ranges: in order to generate all metrics to monitor an in-production model, we often need to integrate multiple datasets with different schema from different time range. For example, in order to compute model metrics like AUC, the collected ground truth is always collected in a different data set with a few days or even months delay after we record the model’s output data. In other cases, we might need to integrate additional dimensional data so that we can create different segments to analyze the model per segment.

2.Reusable and extendable metric and segmentation library: it is not scalable to develop a metric/segmentation logic per model. How to create a reusable yet extendable library to hold the metric and segmentation logic is a challenging task by considering different models might have distinct data schema. Model owners are able to take advantage of MMS to create and schedule pipelines without writing any code to monitor in-production models. MMS is able to integrate generic data and also provides a programming API to be fit into a specific data schema generated by a certain ML platform. MMS also allows developers to use MMS’ APIs to create reusable metric and segmentation logic in an open-contribution library. MMS pipelines are very scalable and Intuit is using MMS to integrate 10M+ rows and 1K+ columns of in-production data to generate 10K+ metrics for in-production models.

Connect with us:

Рекомендации по теме

How Intuit uses Apache Spark to Monitor In-Production Machine Learning Models at Large-Scale

How Intuit uses Apache Spark to Monitor In-Production Machine Learning Models at Large-Scale

Intuit uses Databricks to Facilitate Thousands of Small Business Loans

AWS re:Invent 2019: Migrating Apache Spark and Hive from on-premises to Amazon EMR (ANT327-R1)

Operationalize Apache Spark Analytics

Presto on Apache Spark: A Tale of Two Computation Engines

Remote Monitoring using Apache Spark - Rishi Yadav

Presto On Spark: Scaling not Failing with Spark - Ariel Weisberg, Meta & Shradha Ambekar, Intuit

Running Spark In Production in the Cloud is Not Easy with Nayur Khan (QuantumBlack)

Intuit’s Data Journey to the Lakehouse

Webinar: How Intuit Uses Cassandra Effectively to Improve Customer Experiences

DataFriday #14 - A kick-ass UI for Apache Spark

Save Spark ML Model

Presto On Spark: A Unified SQL Experience

Amazon EMR Runtime for Apache Spark

Unified Low-Code/No-Code Authoring Framework for Batch + Stream in Spark

Getting Started with Apache Spark and Neo4j Using Docker Compose

Scaling Data and ML with Apache Spark and Feast

OpML '20 (Short) - SPOK - Managing ML/Big Data Spark Workloads at scale on Kubernetes

Building real time data pipeline using Spark Streaming

FlowSpec—Apache Spark Pipelines in Production Subramaniam Ramasubramanian Danske Bank

Model Monitoring at Scale with Apache Spark and Verta

Democratizing AI at Intuit: No-Code and Self-Serve tool for building & deploying models

KFServing, Model Monitoring with Apache Spark and a Feature Store

Re-imagine Data Monitoring with whylogs and Spark