Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Показать описание

In this episode of Machine Learning Street Talk, we chat about Large-scale Transfer Learning in Natural Language Processing. The Text-to-Text Transfer Transformer (T5) model from Google AI does an exhaustive survey of what’s important for Transfer Learning in NLP and what’s not. In this conversation, we go through the key takeaways of the paper, text-to-text input/output format, architecture choice, dataset size and composition, fine-tuning strategy, and how to best use more computation.

Beginning with these topics, we diverge into exciting ideas such as embodied cognition, meta-learning, and the measure of intelligence. We are still beginning our podcast journey and really appreciate any feedback from our listeners. Is the chat too technical? Do you prefer group discussions, interviewing experts, or chats between the three of us? Thanks for watching and if you haven’t already, Please Subscribe!

Paper Links discussed in the chat:

Machine Learning Street Talk

Рекомендации по теме

Комментарии

10:50 Text-to-Text Framework
16:14 Transformer Architectures, Encoder-Decoder, Encoder-only, or Decoder-only
22:52 DistilBERT, The Lottery Ticket Hypothesis, Pruning Transformers, Knowledge Distillation
29:50 T5’s findings on the impact of architecture
30:55 Position Embeddings in Transformers
40:32 Self-Supervised Objectives
44:26 ELECTRA, GANs for Text?
47:38 Machine Learning Competitions and Benchmarks
54:28 Datasets used in T5
1:00:23 Meta-Learning for Domain Adaptation
1:04:44 Embodied Cognition, Langauge Grounding, Interpolation and Extrapolation, and The Measure of Intelligence
1:13:20 Training Strategies, Passing tasks as input text, Multi-Task Learning
1:21:46 Scaling Transformers, how to use more compute?
1:26:10 Democratization of Pre-Trained Models and Deep Learning
1:27:28 Vision and Language

MachineLearningStreetTalk

Very privileged to be able to get this kind of insight/discussion and be part of the small viewership. Thank you for taking the time out to make amazing content like this.

RichardHamnett

10/10. Compliments to the chefs!
Really enjoying the conversational style of these videos, especially how you guys make parallels with other recent papers and compare and contrast them. Makes the overwhelming amount of deep learning papers a bit more digestible!
Possibly an unpopular suggestion, but would love to see even more 'ML Ops' /deployment discussions, like how for a given paper a company could go about fine-tuning it to their use-case and/or strategies for dealing with incoming data and retraining models etc.
(I'm finding this channel at a terrible time - final exams are starting on Monday. It's not procrastination if it's Deep learning though!)

whatsinthepapers

Paper suggestion: Go-Explore and its recent enhanced version: First return, then explore from Uber AI.

alibaheri

Interesting talk. Some one mentioned knowledge graph here. Can these models extract knowledge graph from domain specific corpus of text and we use those KGs for inference . This will not only provide an interpretable model but also give an opportunity for humans to enter knowledge directly if needed. Any thoughts?

JaiSaiSriSai

I’m really enjoying the discussions and looking forward to tuning in each episode!
I see that you have this as a podcast on Spotify currently, are you planning on rolling it out to any other platforms? (I like Apple Podcasts... 😀)

PeterOtt

Do something about generative models and the difficulty in approximating intractable posterior distributions, how normalising flows solve that, the probelms with them, etc. Finding a good and computationally feasible method to get exact posterior distributions could have great impact in generative modelling.. (more math preferred)

Kerrosene

"...has released a model with 17 Billion parameters -- 18 Billion -- oh, well, doesn't matter, it's a lot!"

fermigas

of course if they trained it from (-5, 5) and clipped the gradient its not going to be able to extrapolate beyond that because the gradient is clipped. If they clipped your neurons from -5, 5 i doubt you'll be doing more than just breathing lol

TheFinalAnalysis

Black tee shirt person; please don't use any video filter it's kinda weird. Let it be natural like other guys video.
BTW, this is good and thanks for uploading this.

vinayreddy

If you're reading this and have a working model of this I will pay $100 if it can do multivariate discontiguous time series predictions 32 rows ahead from a CSV file.

RoboticusMusic

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

T5: Exploring Limits of Transfer Learning with Text-to-Text Transformer (Research Paper Walkthrough)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[Paper Review] Exploring the Limits of Transfer Learning with a Unified Text to Text Transformer

Team 12 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (reading papers)

[Audio notes] T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

LLM: Exploring the Limits of Transfer Learning with a unified Text-to-Text Transformer (T5)

PR-216: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

What is T5 Model?

Pushing NLP Boundaries: The Power of T5's Unified Text to Text Transformer

An introduction to transfer learning in NLP and HuggingFace with Thomas Wolf

The Limits of NLP

Limits of Transfer Learning (LOD 2020)

Can you solve this 150 years old puzzle? #shorts

Hallucinations in Language Models: Critical Considerations

Does PayPal have transfer limits?

Are there any limits on how much money can be transferred from a debit card to another bank account?

I’m off limits when I’m crafting.✨ Who can relate? 🙋🏼‍♀️ #craft #craftroom #painting #diy #crafty...

New transistor research: Exploring the limits of miniaturisation

Are Transformers Good Learners? Exploring the Limits of Transformer Training [ LingMon #175 ]

T5 and Flan T5 Tutorial

'Transfer Learning Demystified: Leveraging Pretrained Models for Efficient AI Solutions'