Vision Transformer Explained

preview_player
Показать описание
In this video, you will understand the vision transformer architecture and also see the sample code on how to use the ViT model from hugging face.
Рекомендации по теме
Комментарии
Автор

How to use it for a multiclass image dataset

sindhujashukla
Автор

@5:20 C represents channel not color. Say RGB had 3 channels (R, G, B)

bindass