Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Показать описание

This video explores the T5 large-scale study on Transfer Learning. This paper takes apart many different factors of the Pre-Training then Fine-Tuning pipeline for NLP. This involves Auto-Regressive Language Modeling vs. BERT-Style Masked Language Modeling and XLNet-style shuffling, as well as the impact of dataset composition, size, and how to best use more computation. Thanks for watching and please check out Machine Learning Street Talk where Tim Scarfe, Yannic Kilcher and I discuss this paper!

Paper Links:

Thanks for watching! Please Subscribe!

Connor Shorten

Рекомендации по теме

Комментарии

2:00 Pushing the NLP State-of-the-Art
2:40 Text-to-Text Framework
3:28 Factors of Variation Explored
5:00 Value of Pre-Training
5:25 Attention Masking
6:18 Architecture Results
7:02 Denoising Objectives
8:47 Span Corruption Strategy
9:45 Self-Supervised Learning Study Overview
11:14 Datasets
12:24 Dataset Size
12:56 Fine-Tuning Strategy
14:25 Task Imbalance
15:20 Pre-Train, then Fine-Tune
16:26 How should we use extra computation?
18:47 Scaling up to 11B parameters
19:30 What Didn’t Make the List
22:08 Context-Free Question Answering

connor-shorten

I never expected to learn so much from one single video. Amazing work presenting the paper in such a nuanced way!

vatsalkrishna

Thank you! This helped me a lot to understand all the different aspects of T5

emanuelgerber

You're getting better and better at explaining these papers, Connor. Great job. Also, I enjoyed the conversation on the Machine Learning Street Talk channel. Looking forward to seeing more videos there too. 😊

I've decided to start studying NLP in a more organized manner (right now I have some intuition about how it works, but not much theoretical or practical knowledge.) I'll be watching your NLP videos when I need a productive break from my studies. 😊

P.S. I'm embarrassed to admit that only today I found out your first name was Connor. For some reason I thought it was Henry.

BiancaAguglia

Thanks for posting this! This is super helpful!

MakerBen

What is the difference between iid mask tokens and Bert Style mask tokens

tommykelly

These videos are amazing, thanks Henry

SantoshGupta-jnwn

Is 'deshuffling' really an accurate description of the XLNet pre-training objective? To me, deshuffling indicates prediction of the order of tokens within the text - which is not matching with my understanding of XLNet's pretraining objective.

justinmilner

Thank you, sir, your videos are gold!

---ktcs

Thanks for sharing! It would be wonderful if you could get a better mic though. The laptop mic has a very unpleasant echo.

heinsaar

A little hard to follow as someone who hasn't learned much about AI, but still enjoy your videos!

LTNINJA

how much time does it take you guys to read a research paper and what parts do you read. because everytime i try to read one i strart loosing focus, any tips pls help

salimbo

I still don't understand how did they combine training on the C4 dataset and all the task specific datasets (squad etc).
What role did the C4 datset play? How did they turn the raw text data of C4 into a input output task to train on?

Would be grateful if someone could explain, thanks.

dislike__button

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

T5: Exploring Limits of Transfer Learning with Text-to-Text Transformer (Research Paper Walkthrough)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[Paper Review] Exploring the Limits of Transfer Learning with a Unified Text to Text Transformer

Team 12 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (reading papers)

[Audio notes] T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

LLM: Exploring the Limits of Transfer Learning with a unified Text-to-Text Transformer (T5)

PR-216: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

What is T5 Model?

Pushing NLP Boundaries: The Power of T5's Unified Text to Text Transformer

An introduction to transfer learning in NLP and HuggingFace with Thomas Wolf

The Limits of NLP

Limits of Transfer Learning (LOD 2020)

Can you solve this 150 years old puzzle? #shorts

Hallucinations in Language Models: Critical Considerations

Does PayPal have transfer limits?

Are there any limits on how much money can be transferred from a debit card to another bank account?

I’m off limits when I’m crafting.✨ Who can relate? 🙋🏼‍♀️ #craft #craftroom #painting #diy #crafty...

Are Transformers Good Learners? Exploring the Limits of Transformer Training [ LingMon #175 ]

New transistor research: Exploring the limits of miniaturisation

T5 and Flan T5 Tutorial

Exploring the Limits of Chemical Space