HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

Показать описание

#hypertransformer #metalearning #deeplearning

This video contains a paper explanation and an interview with author Andrey Zhmoginov!
Few-shot learning is an interesting sub-field in meta-learning, with wide applications, such as creating personalized models based on just a handful of data points. Traditionally, approaches have followed the BERT approach where a large model is pre-trained and then fine-tuned. However, this couples the size of the final model to the size of the model that has been pre-trained. Similar problems exist with "true" meta-learners, such as MaML. HyperTransformer fundamentally decouples the meta-learner from the size of the final model by directly predicting the weights of the final model. The HyperTransformer takes the few-shot dataset as a whole into its context and predicts either one or multiple layers of a (small) ConvNet, meaning its output are the weights of the convolution filters. Interestingly, and with the correct engineering care, this actually appears to deliver promising results and can be extended in many ways.

OUTLINE:
0:00 - Intro & Overview
3:05 - Weight-generation vs Fine-tuning for few-shot learning
10:10 - HyperTransformer model architecture overview
22:30 - Why the self-attention mechanism is useful here
34:45 - Start of Interview
39:45 - Can neural networks even produce weights of other networks?
47:00 - How complex does the computational graph get?
49:45 - Why are transformers particularly good here?
58:30 - What can the attention maps tell us about the algorithm?
1:07:00 - How could we produce larger weights?
1:09:30 - Diving into experimental results
1:14:30 - What questions remain open?

ERRATA: I introduce Max Vladymyrov as Mark Vladymyrov

Abstract:
In this work we propose a HyperTransformer, a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable. Finally, we extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.

Authors: Andrey Zhmoginov, Mark Sandler, Max Vladymyrov

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Рекомендации по теме

Комментарии

OUTLINE:
0:00 - Intro & Overview
3:05 - Weight-generation vs Fine-tuning for few-shot learning
10:10 - HyperTransformer model architecture overview
22:30 - Why the self-attention mechanism is useful here
34:45 - Start of Interview
39:45 - Can neural networks even produce weights of other networks?
47:00 - How complex does the computational graph get?
49:45 - Why are transformers particularly good here?
58:30 - What can the attention maps tell us about the algorithm?
1:07:00 - How could we produce larger weights?
1:09:30 - Diving into experimental results
1:14:30 - What questions remain open?

ERRATA: I introduce Max Vladymyrov as Mark Vladymyrov

YannicKilcher

Long introduction was great, it is good to be able to understand with drawings what is actually happening.

chochona

Describing it as a buffet is exactly right for this amount of content. This makes it great for everyone: those looking for a summary, an in-depth dive, or looking to implement/adapt it for themselves.

Zed_Oud

Hi Yannic, I've been following your channel since the very beginning and I always enjoyed your style. Since you're asking for comments about this new style format on interviewing papers' authors, I'd like to share my 2-cent impressions. I'd rather much preferred your former style of unbiased reviews by your own which were really professional and right to the technical points. These interviews on the other hand are more "deferential" and less unbiased. I found your previous style much more insightful and useful. Thank you anyway for your work, your channel is my preferred one to keep updated on the subject, I'm a senior MLE in a big telco company in Italy. Thanks!

enniograsso

Love the longer first half that’s more like your earlier work. IMO the interview should be a short Q&A that lets the authors respond about parts you were unsure about or criticized. I much prefer when the paper review is more in depth (ideally even longer than in this video)

NavinF

I am a big fan of your long introduction version. In my opinion, the way you are illustrating your thought is way more insightful than at least half of the videos which authors were included. In many papers, authors could act as supplementary information for the main concepts.

mahdipourmirzaei

As feedback is called for, just wanted to say that I mostly watch the paper explanations. I like the way you explain, that's really good to have.

YvesQuemener

I gotta be honest, your explanations are the best for me because you’re very good at explaining things whereas these researchers are a little more specialized in research. I do like that you interview them though. I’d always ask a question like “how did you come up with this idea” or “what was the inspiration for this idea?”
Love your content! Keep experimenting.

sammay

Long intro was great - we get your explanation and then the interview is a bonus!

Yenrabbit

Really appreciate the time you take to make videos like this!

qwertywifi

2:00 Why not both. If you're into recycling content, we could have 3 videos: The paper review, the interview with the authors and then the paper review interleaved with comments from the authors. Everyone is happy and you got more content for the same price (minus editting, tho if you script the interleaved video before the interview you already know where the commentary will be) EDIT: Oh, this video is kinda like this already.

DamianReloaded

Jesus Christ. What an incredible result!

boffo

I love in depth conversations that aren't afraid to be technical

hamandchees

Damn I'm quick ;) Thanks for the content homie

UberSpaceCow

Livestream interview with chat Q&A from the viewers at the end (last 15 minutes or so) would be great. Nick Zentner has been doing geology interviews long form for the last couple months and it has been superlative for discovering new questions and ideas.

Guytron

Regarding the comment at 8:34: in one of my projects I'm using a neural network for a regression type problem and I found I got much smoother interpolation by switching most of the hidden layers to use asinh as the activation function. I have no idea how general that is or whether smoothness is even a desirable feature when you're trying to output weights for another neural network.

quickdudley

Love both methods more yours but lovely to have both sides

oluwayomiolugbuyi

Is it possible to try this approach but generate MLP models? I'm thinking whether a hypernetwork for NeRF models is possible

theodorosgalanos

Is there any recommanded video talk about semi supervised learning research ? becuase i just know about teacher model and semi-GAN .... Thanks

KnowNothingJohnSnow

Question: how "Hyper"/meta can you get with a setup like this before the resulting performance gets worse/doesnt improve?

Supreme_Lobster

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

AI Weekly Update - January 24th, 2022

I-JEPA: the newer, cooler SSL

Multi-Person 2D Pose Estimation from Thermal Images via Semi-Supervised Learning

Large Multimodal Models are Few-shot Learners

AI against Censorship: Genetic Algorithms, The Geneva Project, ML in Security, and more!

[ML Olds] Meta Research Supercluster | OpenAI GPT-Instruct | Google LaMDA | Drones fight Pigeons

[ML News] DeepMind AlphaCode | OpenAI math prover | Meta battles harmful content with AI

ECCV2022-Oral presentation-Modeling Mask Uncertainty in Hyperspectral Image Reconstruction

A Curated and Comparative Study on Semi-Supervised Learning Techniques for Text Classification

[ML News] Anthropic raises $124M, ML execs clueless, collusion rings, ELIZA source discovered & ...

ICTERI 2020 Invited Talk

AlphaCode - with the authors!

Andrey Zhmoginov | Applied Mathematics (APPM) Department Colloquium

All about AI Accelerators: GPU, TPU, Dataflow, Near-Memory, Optical, Neuromorphic & more (w/ Aut...

[Open DMQA Seminar] Deep Semi-Supervised Learning with Out-of-distribution Unlabeled Data

OpenAI Embeddings (and Controversy?!)

Listening to You! - Channel Update (Author Interviews)

Fellowship: Vision Transformer with Deformable Attention

Неймарк.Лекторий // Андрей Жмогинов (Google), Петр Калинин (Яндекс)...

Matthew Turk, FIEEE, FIAPR, President, TTIC: 'Beyond Bias & Fairness in Face Recognition&ap...

Bridging the Gap between Few-Shot and Many-Shot Learning via Distribution Calibration