Vision Transformer Explained

preview_player

Показать описание

In this video, you will understand the vision transformer architecture and also see the sample code on how to use the ViT model from hugging face.

Рекомендации по теме

Комментарии

How to use it for a multiclass image dataset

sindhujashukla

@5:20 C represents channel not color. Say RGB had 3 channels (R, G, B)

bindass