Reading SWIN transformer source code - Image Recognition with Transformers

Показать описание

This video goes through the source code of Pytorch "vision" implementation of SWIN image recognition model.
This is not the original implementation of the paper, but rather, "torchvision" reimplementation that attempts to follow the original as close as possible and achieves the same results.
Important links:

00:00 - Intro
01:44 - Model Lineage and Versions
04:13 - Data Loading and Augmentations
11:51 - Overall Model Structure
25:32 - Stochastic Depth
32:56 - Shifted Window Attention
50:56 - Patch Merging Block
54:08 - Next Up

Mak Gaiduk

Рекомендации по теме

Комментарии

I have spent a week to understand the underling implementation of how SwinTransformers work. I have learnt so much from you. Really thanks so much.

ahmedbahgat

Nice! The roll and masking operations are basically the most important ones in swin - very useful concepts considering the difficulty of actually mapping the rolled windows back to each other. It can be very confusing, especially maintaining the outer windows which are partly appearing in completely different regions of the image after the roll.

It would also be cool if you could use the pre-trained weights so you can actually show the meaning of indermediate and final model outputs (like attention heatmaps or class probabilities) - this sometimes helps to capture the modules functionality🙂

davidro

Awesome! Learned about stochastic depth from the video.

vslaykovsky

my bro, my man. The series keep getting on fire 🔥

anhduy

Hi bro, can you please explain the paper "MaxViT: Multi-Axis Vision Transformer" and its code? Thank u in advance.

akramsalim

Reading SWIN transformer source code - Image Recognition with Transformers

Reading SWIN transformer source code - Image Recognition with Transformers

Swin Transformer paper animated and explained

Swin Transformer Code

Reading ViT (Vision Transformer) PyTorch source code

Swin Transformer - Paper Explained

SWIN transformer (image recognition)

Remote Sensing Object Detection Based on Convolution and Swin Transformer

Mathematics w/ Donut AI and Nougat AI - Swin Transformer

Period on the road 😱 | Omg..

How Fear of Holes was invented

Reading Codetr source code - part 1 out of 2

Very resourceful GIRL 🤯 #camping #survival #bushcraft #outdoors

Growing up Pentecostal... #short

How A Bionic Arm Works 🤔

Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation

Hiera: 2.6x faster than ViT

flowchart with chatgpt | flow chart with gpt in 1 minute #chatgpt #flowchart #ai

CoDETR - SOTA object detection with transformers

Handyman's Don't Want You To Know This! Tips & Hacks That Work Extremely Well

A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts

[Tutorial] Training End-to-end Object Detection with Transformer(DETR) model on custom dataset

Bunion Correction 😨 (explained)

If You Move, You Die 😱 [Part 1] [Movie Recap] #shorts #viral

Tutorial 2- Fine Tuning Pretrained Model On Custom Dataset Using 🤗 Transformer