Vision Transformer for Image Classification

Показать описание

Vision Transformer (ViT) is the new state-of-the-art for image classification. ViT was posted on arXiv in Oct 2020 and officially published in 2021. On all the public datasets, ViT beats the best ResNet by a small margin, provided that ViT has been pretrained on a sufficiently large dataset. The bigger the dataset, the greater the advantage of the ViT over ResNet.

Reference:
- Dosovitskiy et al. An image is worth 16×16 words: transformers for image recognition at scale. In ICLR, 2021.

Рекомендации по теме

Комментарии

Great Explanation with detailed notations. Most of the videos found in the YouTube were some kind of oral explanation. But this kind of symbolic notation is very helpful for garbing the real picture, specially if anyone want to re-implement or add new idea with it. Thank you so much. Please continuing helping us by making these kind of videos for us.

UzzalPodder

Can't stress enough on how easy to understand you made it

mmpattnaik

These are some of the best, hands-on and simple explanations I've seen in a while on a new CS method. Straight to the point with no superfluous details, and at a pace that let me consider and visualize each step in my mind without having to constantly pause or rewind the video. Thanks a lot for your amazing work! :)

drakehinst

Clear, concise, and overall easy to understand for a newbie like me. Thanks!

adityapillai

great expalation! Good for you! Don't stop giving ML guides!

ai_lite

The best video so far. The animation is easy to follow and the explaination is very straight forward.

drelvenkee

The best ViT explanation available. Also key to understand this for understanding Dino and Dino V2

thecheekychinaman

Man, you made my day! These lectures were golden. I hope you continue to make more of these

sheikhshafayat

Amazing, I am in a rush to implement vision transformer as an assignement, and this saved me so much time !

valentinfontanger

Amazing video. It helped me to really understand the vision transformers. Thanks a lot.

aimeroundiaye

15 minutes of heaven 🌿. Thanks a lot understood clearly!

thepresistence

Very good explanation, better that many other videos on YouTube, thank you!

vladik

This reminds me of Encarta encyclopedia clips when I was a kid lol! Good job mate!

swishgtv

Thank you for your Attention Models playlist. Well explained.

arash_mehrabi

This was a great video. Thanks for your time producing great content.

MonaJalal

You have explained ViT in simple words. Thanks

rajgothi

Thank you, your video is way underrated. Keep it up!

DerekChiach

Thank you so much for this amazing presentation. You have a very clear explanation, I have learnt so much. I will definitely watch your Attention models playlist.

sehaba

good video, what a splendid presentation, wang shusen yyds.

wengxiaoxiong

Nicely explained. Appreciate your efforts.

nehalkalita

Vision Transformer for Image Classification

Vision Transformer for Image Classification

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

Image Classification Using Vision Transformer | ViTs

Vision Transformers explained

Vision Transformer for Image Classification Using transfer learning

Vision Transformers (ViT) Explained + Fine-tuning in Python

An image is worth 16x16 words: ViT | Vision Transformer explained

Machine Learning Interview Questions Session 1

Vision Transformer (ViT) - Using Transformers for Image Classification | HuggingFace

Vision Transformer - Keras Code Examples!!

New TECH: Vision Transformer 2023 on Image Classification | AI

Image Classification using Vision Transformer (ViT) in TensorFlow

Vision Transformer Explained

Vision Transformer Basics

Image Classification Computer Vision with Hugging Face Transformers -Google ViT - Python ML Tutorial

Vision Transformer and its Applications

Vision Transformer Attention

Are Transformers better than CNN's at Image Classification? An end to end project #cnn #transfo...

ResNet50 ViT - Vision Transformer with ResNet50 Implementation in TensorFlow

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

Vision transformers: query and key images

Hugging Face - Walkthrough, Discussions, Demo with Vision Transformer for Image Classification

Vision Transformer (ViT) Paper Explanation