DistilBERT Revisited smaller,lighter,cheaper and faster BERT Paper explained

Показать описание

DistilBERT Revisited :smaller,lighter,cheaper and faster BERT Paper explained

In this video I will be explaining about DistillBERT. The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT, and the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT’s performances as measured on the GLUE language understanding benchmark.

If you like such content please subscribe to the channel here:

If you like to support me financially, It is totally optional and voluntary.

Relevant links:
Knowledge Distillation:

BERT:

DistillBERT:
GLUE benchmarks:

Рекомендации по теме

Комментарии

is it posssible to create customize distilbert model using pretrained distilbert model layer????

sadikaljarif

Please clarify the ablation study. Is Φ combination of all three loss or just distill loss? Thanks.

kishangupta

why token embedding is removed from distilbert?

Gokulhraj

Great presentation. Can you show a use case for Named Entity Recognition (NER)

sampeter

do the similar kind of session on Bio Bert and Clinical BERT

venustat

Nice. the approach is same as Original Jeffrey Hinton KD paper. It would be great if you could review XtremeDistilTransformers paper by Microsoft Research Team! Thanks for the video.

mrp

DistilBERT Revisited smaller,lighter,cheaper and faster BERT Paper explained

DistilBERT Revisited smaller,lighter,cheaper and faster BERT Paper explained

DistilBERT: a distilled version of BERT: smaller, faster, cheaper, and lighter

#Python | Intro to HuggingFace: DistilBERT | #PyTorch #BERT

AI Joke on Teacher-Student Knowledge Distillation Model #aimodel #machinelearning #ai #deeplearning

L19.6 DistilBert Movie Review Classifier in PyTorch -- Code Example

IMPROVING BERT FINE-TUNING VIA SELF-ENSEMBLE AND SELF-DISTILL ATION paper review!

How do computers know what 'it' means: The problem of 'coref'

LLAVADI: What Matters For Multimodal Large Language Models Distillation - ArXiv:2407.194

LLAVADI: What Matters For Multimodal Large Language Models Distillation - ArXiv:2407.194

LLAVADI: What Matters For Multimodal Large Language Models Distillation - ArXiv:2407.194

NLP | BERT | Paper Explained

Panel: How much should conversational AI developers know about ML and linguistics?

Knowledge Distillation for BERT

UNIT - 4_Other Pretrained Language Models

Григорий Сапунов | Transformer Zoo