Adding Self-Attention to a Convolutional Neural Network! PyTorch Deep Learning Tutorial

Показать описание

TIMESTAMPS:
0:00 Introduction
0:22 Attention Mechanism Overview
1:20 Self-Attention Introduction
3:02 CNN Limitations
4:09 Using Attention in CNNs
6:30 Attention Integration in CNN
9:06 Learnable Scale Parameter
10:14 Attention Implementation
12:52 Performance Comparison
14:10 Attention Map Visualization
14:29 Conclusion

In this video I show how we can add Self-Attention to a CNN in order to improve the performance of our classifier!

Donations

The corresponding code is available here! (Section 13)

Discord Server:

Рекомендации по теме

Комментарии

very cool stuff. Any idea how this compares to Feature Pyramid Networks, which are typically used to enrich the high-res early convolutional layers?

I would imagine that the FPN works well if the thing of interest is "compact". I.e. can be captured well by a quadratic crop, whereas the attention would even work for non-compact things. Examples would be donuts with large holes and little dough, or long sticks, etc.

thouys

Good video! Do you think this experiment of adding the attention head so early on can extrapolate well to graph neural networks?

yadavadvait

hello. I was trying to introduce a self_attention layer between a fully connected layer (with 32 neurons) and an output layer to recreate "Patt-lite" CNN model. I used Attention function from maximal library. The thing is, I get mixed results for the same parameters, even seed. Sometimes I get quickly to 95% accuracy and other times it doesn't learn at all and stays at 15-30%. Without the attention added, i get constant ~75%. Do you know why this could be happening?

aldonin

I'm guessing that adding self-attention in deeper layers would have lesser of an impact due to each value having greater receprive field?
If not, then why not to add at the end, where it would be less expensive? Without the fact that we could incorporate it in every conv block if we had infinite compute

unknown-otter

Can you make a video on dynamic convolution on resnet50 model

Glitch

Adding Self-Attention to a Convolutional Neural Network! PyTorch Deep Learning Tutorial

Adding Self-Attention to a Convolutional Neural Network! PyTorch Deep Learning Tutorial

Attention mechanism: Overview

Attention for Neural Networks, Clearly Explained!!!

Cross Attention vs Self Attention

Convolutional Block Attention Module (CBAM) Paper Explained

Self-Attention-based Convolutional Neural Network for Fraudulent Behavior Detection in Sports

SENets: Channel-Wise Attention in Convolutional Neural Networks

Self Attention - A crucial building block of Transformers Architecture

Visual Generative Modeling workshop@CVPR 2025, morning session

On the relationship between Self-Attention and Convolutional Layers

Self-Attention Modeling for Visual Recognition, by Han Hu

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Why Sine & Cosine for Transformer Neural Networks

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Illustrated Guide to Transformers Neural Network: A step by step explanation

Focal Transformer: Focal Self-attention for Local-Global Interactions in Vision Transformers

Position Encoding Details in Transformer Neural Networks

The Lipschitz Constant of Self-Attention

Lecture 13: Attention

Automatic Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Seque...

ICML 2019 Self-Attention Generative Adversarial Networks (SAGAN)

Lets code the Transformer Encoder

5 concepts in transformers (part 3)

Convolutional graph neural networks and attention mechanisms