Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

preview_player
Показать описание
In this episode of Machine Learning Street Talk, we chat about Large-scale Transfer Learning in Natural Language Processing. The Text-to-Text Transfer Transformer (T5) model from Google AI does an exhaustive survey of what’s important for Transfer Learning in NLP and what’s not. In this conversation, we go through the key takeaways of the paper, text-to-text input/output format, architecture choice, dataset size and composition, fine-tuning strategy, and how to best use more computation.

Beginning with these topics, we diverge into exciting ideas such as embodied cognition, meta-learning, and the measure of intelligence. We are still beginning our podcast journey and really appreciate any feedback from our listeners. Is the chat too technical? Do you prefer group discussions, interviewing experts, or chats between the three of us? Thanks for watching and if you haven’t already, Please Subscribe!

Paper Links discussed in the chat:
Рекомендации по теме
Комментарии
Автор

10:50 Text-to-Text Framework
16:14 Transformer Architectures, Encoder-Decoder, Encoder-only, or Decoder-only
22:52 DistilBERT, The Lottery Ticket Hypothesis, Pruning Transformers, Knowledge Distillation
29:50 T5’s findings on the impact of architecture
30:55 Position Embeddings in Transformers
40:32 Self-Supervised Objectives
44:26 ELECTRA, GANs for Text?
47:38 Machine Learning Competitions and Benchmarks
54:28 Datasets used in T5
1:00:23 Meta-Learning for Domain Adaptation
1:04:44 Embodied Cognition, Langauge Grounding, Interpolation and Extrapolation, and The Measure of Intelligence
1:13:20 Training Strategies, Passing tasks as input text, Multi-Task Learning
1:21:46 Scaling Transformers, how to use more compute?
1:26:10 Democratization of Pre-Trained Models and Deep Learning
1:27:28 Vision and Language

MachineLearningStreetTalk
Автор

Very privileged to be able to get this kind of insight/discussion and be part of the small viewership. Thank you for taking the time out to make amazing content like this.

RichardHamnett
Автор

10/10. Compliments to the chefs!
Really enjoying the conversational style of these videos, especially how you guys make parallels with other recent papers and compare and contrast them. Makes the overwhelming amount of deep learning papers a bit more digestible!
Possibly an unpopular suggestion, but would love to see even more 'ML Ops' /deployment discussions, like how for a given paper a company could go about fine-tuning it to their use-case and/or strategies for dealing with incoming data and retraining models etc.
(I'm finding this channel at a terrible time - final exams are starting on Monday. It's not procrastination if it's Deep learning though!)

whatsinthepapers
Автор

Paper suggestion: Go-Explore and its recent enhanced version: First return, then explore from Uber AI.

alibaheri
Автор

Interesting talk. Some one mentioned knowledge graph here. Can these models extract knowledge graph from domain specific corpus of text and we use those KGs for inference . This will not only provide an interpretable model but also give an opportunity for humans to enter knowledge directly if needed. Any thoughts?

JaiSaiSriSai
Автор

I’m really enjoying the discussions and looking forward to tuning in each episode!
I see that you have this as a podcast on Spotify currently, are you planning on rolling it out to any other platforms? (I like Apple Podcasts... 😀)

PeterOtt
Автор

Do something about generative models and the difficulty in approximating intractable posterior distributions, how normalising flows solve that, the probelms with them, etc. Finding a good and computationally feasible method to get exact posterior distributions could have great impact in generative modelling.. (more math preferred)

Kerrosene
Автор

"...has released a model with 17 Billion parameters -- 18 Billion -- oh, well, doesn't matter, it's a lot!"

fermigas
Автор

of course if they trained it from (-5, 5) and clipped the gradient its not going to be able to extrapolate beyond that because the gradient is clipped. If they clipped your neurons from -5, 5 i doubt you'll be doing more than just breathing lol

TheFinalAnalysis
Автор

Black tee shirt person; please don't use any video filter it's kinda weird. Let it be natural like other guys video.
BTW, this is good and thanks for uploading this.

vinayreddy
Автор

If you're reading this and have a working model of this I will pay $100 if it can do multivariate discontiguous time series predictions 32 rows ahead from a CSV file.

RoboticusMusic