filmov
tv
Behavioral Testing of ML Models (Unit tests for machine learning)

Показать описание
How can we empower machine learning models with powerful software engineering techniques like unit testing?
Evaluating ML models using a single metric (like accuracy or F1-score) produce a low-resolution picture of model performance. Behavioral tests can give us a much higher resolution evaluation of a model's capabilities. By creating tests (which are small targeted test sets), we can better compare models or observe how model performance changes after re-training a model (or fine-tuning it). We discuss the paper 'Beyond Accuracy: Behavioral Testing of NLP Models with CheckList', which was selected as the ACL 2020 Best Paper.
Introduction (0:00)
Comparing models using capabilities (0:33)
Behavioral test of NLP models (3:06)
Test Type 1: Minimum Functionality Tests (4:22)
Test Type 2: Invariance Tests (7:04)
Test Type 3: Directional Expectation Tests (7:32)
Summary and Conclusion (10:00)
------
Paper: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
Code:
------
More videos by Jay:
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Explainable AI Cheat Sheet - Five Key Categories
The Narrated Transformer Language Model
Jay's Visual Intro to AI
How GPT-3 Works - Easily Explained with Animations
Evaluating ML models using a single metric (like accuracy or F1-score) produce a low-resolution picture of model performance. Behavioral tests can give us a much higher resolution evaluation of a model's capabilities. By creating tests (which are small targeted test sets), we can better compare models or observe how model performance changes after re-training a model (or fine-tuning it). We discuss the paper 'Beyond Accuracy: Behavioral Testing of NLP Models with CheckList', which was selected as the ACL 2020 Best Paper.
Introduction (0:00)
Comparing models using capabilities (0:33)
Behavioral test of NLP models (3:06)
Test Type 1: Minimum Functionality Tests (4:22)
Test Type 2: Invariance Tests (7:04)
Test Type 3: Directional Expectation Tests (7:32)
Summary and Conclusion (10:00)
------
Paper: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
Code:
------
More videos by Jay:
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Explainable AI Cheat Sheet - Five Key Categories
The Narrated Transformer Language Model
Jay's Visual Intro to AI
How GPT-3 Works - Easily Explained with Animations
Комментарии