MATS stack for Cross-system Orchestration of Machine Learning Pipelines

preview_player
Показать описание
At Avast we complete over 17 million phishing detections a day, providing crucial online protection for this type of attacks.

In this talk Joao Da Silva and Yury Kasimov will present the MATS stack for productionisation of Machine Learning and their journey into integrating model tracking, storage, cross-system orchestration and model deployments for a complete and modern machine learning pipeline.

One can integrate MATS stack into their existing ecosystem without disruption, no need to migrate to clean AWS all of a sudden.

MATS stack consists of adopting MLFLow, Airflow, Tensorflow and Spark to form a cross-system orchestrated ML pipeline into a standard set of well integrated tools which data scientists at Avast can adopt.

They will use Angler, an internal machine learning project for detecting phishing URLs to demonstrate how MATS stack was leveraged for this ML Pipeline, walking the audience through all stages of the Angler pipeline: data transformations and enrichments in Spark, training of models, experiment tracking and serving of the models. The pipeline is useful for fast and reproducible experiments and it allows a fast progression from research to production.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:
Рекомендации по теме