Behavioral Testing of ML Models (Unit tests for machine learning)

preview_player
Показать описание
How can we empower machine learning models with powerful software engineering techniques like unit testing?

Evaluating ML models using a single metric (like accuracy or F1-score) produce a low-resolution picture of model performance. Behavioral tests can give us a much higher resolution evaluation of a model's capabilities. By creating tests (which are small targeted test sets), we can better compare models or observe how model performance changes after re-training a model (or fine-tuning it). We discuss the paper 'Beyond Accuracy: Behavioral Testing of NLP Models with CheckList', which was selected as the ACL 2020 Best Paper.

Introduction (0:00)
Comparing models using capabilities (0:33)
Behavioral test of NLP models (3:06)
Test Type 1: Minimum Functionality Tests (4:22)
Test Type 2: Invariance Tests (7:04)
Test Type 3: Directional Expectation Tests (7:32)
Summary and Conclusion (10:00)

------

Paper: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

Code:

------

More videos by Jay:
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)

Explainable AI Cheat Sheet - Five Key Categories

The Narrated Transformer Language Model

Jay's Visual Intro to AI

How GPT-3 Works - Easily Explained with Animations
Рекомендации по теме
Комментарии
Автор

This is a great topic! Thanks for presenting it so nicely! Well spoken and visualized! 💪

AICoffeeBreak
Автор

Nicely explained Jay💯. I look forward to more of these

katnoria
Автор

Your videos, contents and explanations are really good. Thanks for making quality content.

It will be more nice, if you speak with same pitch as a start till the end of the sentence. Because words at the end of the sentences are low in volume.

Thanks again for the great videos

SreeramAjay
Автор

Great Video, But using a small test set for QA should be done carefully as with time model can over-fit on those datasets.

manavmadan
Автор

What a great presentation. Can I say Behavioral testing is somehow similar to Metamorphic testing in ML-Based Systems?

haftamuhailu
Автор

This is a very interesting approach that can be extended to vision models as well!

ramandutt
Автор

Thank you so much for the nice expanation.

abrar-tech
Автор

Really cool video Jay. Have you come across any equivalent approaches for tabular data?

jsnctl
Автор

Can same concepts be applied to supervised models .. like regression or classification models?

saurabhatwipro
Автор

Jay, are you aware of any other code examples of these tests?

AZ
Автор

What should I get from this? That AI in Natural Language Processing is still in its infancy??
Have you heard about Duolingo. I need to know whether AI can successfully be implemented in Language Learning. It seems to me Duolingo corrects homework based on the order of the tiles.

Gives you [boy] [am] [I] [a] (which he reads as [4] [2] [1] [3]. Just expects a correct order.)

[I] [am] [a] [boy] >>[1] [2] [3] [4]

That explains why (when it asks you to type) " I am a boy " is wrong but " I am a boy. " is correct! Just because of a dot! Sometimes it penalises FOR writing a dot. I'm guessing it checks the database for the exact sentence as opposed to language recognition.

oosmanbeekawoo
visit shbcf.ru