Reading ViT (Vision Transformer) PyTorch source code

Показать описание

Vision Transformer is one of the two most popular transformer-based, huge models for image recognition (second one being Swin). It is considered a heavy-weight replacement for cnn-based models like ResNet.
ViT has 2 main implementations: the original one, from Google, written in Flax, and the one from PyTorch team. PyTorch one opensourced training code as well as inference/finetuning, so this is the one I will go over in the video.
Important links:

00:00 - Intro
02:09 - Lineage and Model Versions
06:22 - Installation and Debugging Setup
19:00 - Data Loading and Augmentations
26:31 - Model Inference Code
38:45 - Training Code
42:12 - Next Up

Mak Gaiduk

Рекомендации по теме

Комментарии

keep pushing brother! Young students and researchers will need these kind of content

anhduy

Mak, you are awesome, please keep up and you'll acquire the community you deserve. Your videos are awesome, well explained and you explore subjects that no one has before. Hope to see you channel grow !!

AntoineHouet

how do u download from kaggle to ubuntu instance?
The download link will only download to the local computer....

eliaweiss

Reading ViT (Vision Transformer) PyTorch source code

Reading ViT (Vision Transformer) PyTorch source code

Vision Transformers (ViT) Explained + Fine-tuning in Python

Implement and Train ViT From Scratch for Image Recognition - PyTorch

PyTorch Paper Replicating (building a vision transformer with PyTorch)

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

Vision Transformer Basics

UNETR Implementation for 2D Segmentation in PyTorch | UNTER = Vision Transformer + CNN Decoder

Vision Transformers (VIT) - Human Emotions Detection

Reading SWIN transformer source code - Image Recognition with Transformers

Vision Transformer (ViT) - Using Transformers for Image Classification | HuggingFace

Vision Transformer from Scratch and Training Implementation

EfficientML.ai Lecture 14 - Vision Transformer (MIT 6.5940, Fall 2023)

Robust Perception with Vision Transformer SegFormer

vision transformer and Deit using PyTorch Lightning

Attention in transformers, visually explained | Chapter 6, Deep Learning

DINO in PyTorch

Deep Dive into Vision Transformer : From concepts to code from scratch using Pytorch

12 Vision Transformers - Computer Vision - Winter Term 21/22 - Freie Universität Berlin

Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman

Vision Transformer Explained

Swin Transformer paper animated and explained

Vision Transformer(ViT) - Image is worth 16x16 words | Paper Explained