filmov
tv
DVC: Data Versioning and ML Experiments on Top of Git

Показать описание
💻 Abstract:
DVC: Data Versioning and ML Experiments on Top of Git. ML experimentation or ML metrics logging tools become very popular these days. These tools help ML researchers and engineers to keep track of metrics. However, these tools do not provide reproducibility of the experiments since source code and training data need to be tracked and versioned separately from the logged metrics. In this talk, we show how ML metrics can be tracked together with code, data, and ML models in Git repositories using the open-source tool DVC. Keeping data and models for hundreds of experiments in a Git repository might look like a not a realistic idea. But we show how data and ML experiments codification and metafiles can make this approach feasible and even very efficient.
🔊 Speaker bio:
Timestamps:
0:00 Intro
0:13 Introduction of the host
1:22 Introduction of the speaker
3:22 DVC principles
4:57 What DVC does?
6:23 DVC first step
6:56 Data versioning
11:02 Integration and compatibility CI/CD
12:55 ML experiment in DVC (demo)
30:03 Conclusion
❓ Q&A ❓
31:45 If the data is in this path is it' changing? Is it still being controlled?
33:32 Is there a link, or a Twitter account that they can reach out to you?
34:25 Closing remarks
DVC: Data Versioning and ML Experiments on Top of Git. ML experimentation or ML metrics logging tools become very popular these days. These tools help ML researchers and engineers to keep track of metrics. However, these tools do not provide reproducibility of the experiments since source code and training data need to be tracked and versioned separately from the logged metrics. In this talk, we show how ML metrics can be tracked together with code, data, and ML models in Git repositories using the open-source tool DVC. Keeping data and models for hundreds of experiments in a Git repository might look like a not a realistic idea. But we show how data and ML experiments codification and metafiles can make this approach feasible and even very efficient.
🔊 Speaker bio:
Timestamps:
0:00 Intro
0:13 Introduction of the host
1:22 Introduction of the speaker
3:22 DVC principles
4:57 What DVC does?
6:23 DVC first step
6:56 Data versioning
11:02 Integration and compatibility CI/CD
12:55 ML experiment in DVC (demo)
30:03 Conclusion
❓ Q&A ❓
31:45 If the data is in this path is it' changing? Is it still being controlled?
33:32 Is there a link, or a Twitter account that they can reach out to you?
34:25 Closing remarks