filmov
tv
Flamingo: a Visual Language Model for Few-Shot Learning
Показать описание
DeepMind's Flamingo model was introduced in the work "Flamingo: a Visual Language Model for Few-Shot Learning" by J-B. Alayrac et al. (NeurIPS 2022). This video provides a description of the details.
Timestamps:
00:00 - Flamingo: a Visual Language Model for Few-Shot Learning
00:21 - Outline
01:10 - Motivation
04:46 - Challenges for multimodal generative modelling
07:42 - Related Work
14:23 - Flamingo Model
17:26 - Vision encoder: pixels to features
18:48 - Vision encoder details
21:41 - Perceiver resampler
23:30 - Conditioning the language model
25:31 - Per-image/video attention masking
29:01 - Flamingo - training data
32:32 - Flamingo training objective
33:16 - Task adaptation with few-shot in-context learning
35:24 - Few-shot in-context learning details
40:06 - Flamingo models
41:51 - Few-shot evaluation benchmarks
44:23 - Flamingo: dataset deduplication
46:53 - Flamingo: nuts and bolts training details
50:17 - Few-shot: comparison to SotA
53:50 - Few-shot: further analysis
59:07 - Contrastive pretraining: zero-shot retrieval
59:58 - Fine-tuning Flamingo
01:01:58 - Ablation studies
01:12:50 - Qualitative results
01:17:43 - Qualitative results - dialogue
01:21:33 - Qualitative results - video
01:22:16 - Qualitative results - more videos
01:22:36 - Flamingo limitations
01:24:58 - Flamingo failures: hallucinations/ungrounded guesses
01:25:52 - Trade-offs of few-shot learning methods
01:29:07 - Flamingo opportunities
01:30:25 - Flamingo benefits
01:31:29 - Flamingo risks and mitigation strategies
01:35:01 - Summary
Particular thanks to Antoine Miech for his help in clarifying several details of the work.
For related content:
For (optional) coffee donations:
Timestamps:
00:00 - Flamingo: a Visual Language Model for Few-Shot Learning
00:21 - Outline
01:10 - Motivation
04:46 - Challenges for multimodal generative modelling
07:42 - Related Work
14:23 - Flamingo Model
17:26 - Vision encoder: pixels to features
18:48 - Vision encoder details
21:41 - Perceiver resampler
23:30 - Conditioning the language model
25:31 - Per-image/video attention masking
29:01 - Flamingo - training data
32:32 - Flamingo training objective
33:16 - Task adaptation with few-shot in-context learning
35:24 - Few-shot in-context learning details
40:06 - Flamingo models
41:51 - Few-shot evaluation benchmarks
44:23 - Flamingo: dataset deduplication
46:53 - Flamingo: nuts and bolts training details
50:17 - Few-shot: comparison to SotA
53:50 - Few-shot: further analysis
59:07 - Contrastive pretraining: zero-shot retrieval
59:58 - Fine-tuning Flamingo
01:01:58 - Ablation studies
01:12:50 - Qualitative results
01:17:43 - Qualitative results - dialogue
01:21:33 - Qualitative results - video
01:22:16 - Qualitative results - more videos
01:22:36 - Flamingo limitations
01:24:58 - Flamingo failures: hallucinations/ungrounded guesses
01:25:52 - Trade-offs of few-shot learning methods
01:29:07 - Flamingo opportunities
01:30:25 - Flamingo benefits
01:31:29 - Flamingo risks and mitigation strategies
01:35:01 - Summary
Particular thanks to Antoine Miech for his help in clarifying several details of the work.
For related content:
For (optional) coffee donations:
Комментарии