filmov
tv
Machine Learning Experimentation with DVC and VS Code
Показать описание
Co-hosted by FourthBrain and Iterative. Learn how to manage and make your machine learning projects reproducible with an open-source tool DVC and its extension for VS Code. We will see how to track datasets and models, run, compare, visualize, and track machine learning experiments right in VS Code.
Link to the repository with code:
Link to Alex Kim's Github:
Iterative builds DVC, CML, and other developer tools for machine learning. They're a well-funded, remote-first team on a mission to solve the complexity of managing datasets, ML infrastructure, and ML models lifecycle management.
——
00:00 About today's talk
03:44 Introduction
04:27 The problem we want to solve
05:31 What happens next: Goals
06:12 Goal #1: Achieve best performance
06:59 Goal #2: Ensure reproducibility
08:19 Goal #3: Minimal setup and dependency of 3rd party services
10:18 Why it's difficult to achieve all three goals (Same experiments/different metrics)
12:44 When in doubt go with Open-Source Software
14:20 Open-Source tools: Git, Visual Studio Code & DVC
16:08 DVC: What is DVC?
17:25 DVC: What are DVC pipelines?
20:52 DEMO: Initial setup
24:25 DEMO: Start experimenting
27:21 DEMO: Automating experiments, Grid Search
29:24 DEMO: Keeping one of the experiments
31:47 DEMO: Some comments on the process
33:11 DEMO: How does DVC handle Model and Data files?
35:47 Summary
37:37 Alex's take on MLOps, DevOps and GitOps
42:47 Alex's take on the difference between Data Scientist, ML Engineer and MLOps Engineer
47:45 Alex's take on data preparation and the overall ML pipeline
51:10 Alex's take on when W&B and MLFlow might be unreliable
53:13 Conclusion
Link to the repository with code:
Link to Alex Kim's Github:
Iterative builds DVC, CML, and other developer tools for machine learning. They're a well-funded, remote-first team on a mission to solve the complexity of managing datasets, ML infrastructure, and ML models lifecycle management.
——
00:00 About today's talk
03:44 Introduction
04:27 The problem we want to solve
05:31 What happens next: Goals
06:12 Goal #1: Achieve best performance
06:59 Goal #2: Ensure reproducibility
08:19 Goal #3: Minimal setup and dependency of 3rd party services
10:18 Why it's difficult to achieve all three goals (Same experiments/different metrics)
12:44 When in doubt go with Open-Source Software
14:20 Open-Source tools: Git, Visual Studio Code & DVC
16:08 DVC: What is DVC?
17:25 DVC: What are DVC pipelines?
20:52 DEMO: Initial setup
24:25 DEMO: Start experimenting
27:21 DEMO: Automating experiments, Grid Search
29:24 DEMO: Keeping one of the experiments
31:47 DEMO: Some comments on the process
33:11 DEMO: How does DVC handle Model and Data files?
35:47 Summary
37:37 Alex's take on MLOps, DevOps and GitOps
42:47 Alex's take on the difference between Data Scientist, ML Engineer and MLOps Engineer
47:45 Alex's take on data preparation and the overall ML pipeline
51:10 Alex's take on when W&B and MLFlow might be unreliable
53:13 Conclusion
Комментарии