DistilBERT Revisited smaller,lighter,cheaper and faster BERT Paper explained

preview_player
Показать описание
DistilBERT Revisited :smaller,lighter,cheaper and faster BERT Paper explained

In this video I will be explaining about DistillBERT. The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT, and the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT’s performances as measured on the GLUE language understanding benchmark.

If you like such content please subscribe to the channel here:

If you like to support me financially, It is totally optional and voluntary.

Relevant links:
Knowledge Distillation:

BERT:

DistillBERT:
GLUE benchmarks:
Рекомендации по теме
Комментарии
Автор

is it posssible to create customize distilbert model using pretrained distilbert model layer????

sadikaljarif
Автор

Please clarify the ablation study. Is Φ combination of all three loss or just distill loss? Thanks.

kishangupta
Автор

why token embedding is removed from distilbert?

Gokulhraj
Автор

Great presentation. Can you show a use case for Named Entity Recognition (NER)

sampeter
Автор

do the similar kind of session on Bio Bert and Clinical BERT

venustat
Автор

Nice. the approach is same as Original Jeffrey Hinton KD paper. It would be great if you could review XtremeDistilTransformers paper by Microsoft Research Team! Thanks for the video.

mrp