What is Mutli-Head Attention in Transformer Neural Networks?

preview_player
Показать описание
#shorts #machinelearning #deeplearning
Рекомендации по теме
Комментарии
Автор

This is really hand wavy. Mapped how? Split how?

chadarmstrong
Автор

How does it work for multiple sentences? And what is the input to decoder? Key or value or query

deepalisharma
Автор

That "splitting into 8 parts" thing, is this also the case for ViT ?
Seems like they only do this for NLP tasks, but not for vision ones.
Thanks for the vid, btw👍

my_master
Автор

Are multiple heads just used to get better performance, like an ensemble?

its_fergi
Автор

Didnt get anything horrible explanation

karlheifisch