Reading ViT (Vision Transformer) PyTorch source code

preview_player
Показать описание
Vision Transformer is one of the two most popular transformer-based, huge models for image recognition (second one being Swin). It is considered a heavy-weight replacement for cnn-based models like ResNet.
ViT has 2 main implementations: the original one, from Google, written in Flax, and the one from PyTorch team. PyTorch one opensourced training code as well as inference/finetuning, so this is the one I will go over in the video.
Important links:

00:00 - Intro
02:09 - Lineage and Model Versions
06:22 - Installation and Debugging Setup
19:00 - Data Loading and Augmentations
26:31 - Model Inference Code
38:45 - Training Code
42:12 - Next Up
Рекомендации по теме
Комментарии
Автор

keep pushing brother! Young students and researchers will need these kind of content

anhduy
Автор

Mak, you are awesome, please keep up and you'll acquire the community you deserve. Your videos are awesome, well explained and you explore subjects that no one has before. Hope to see you channel grow !!

AntoineHouet
Автор

how do u download from kaggle to ubuntu instance?
The download link will only download to the local computer....

eliaweiss