What is Mutli-Head Attention in Transformer Neural Networks?

preview_player

Показать описание

#shorts #machinelearning #deeplearning

Рекомендации по теме

Комментарии

This is really hand wavy. Mapped how? Split how?

chadarmstrong

How does it work for multiple sentences? And what is the input to decoder? Key or value or query

deepalisharma

That "splitting into 8 parts" thing, is this also the case for ViT ?
Seems like they only do this for NLP tasks, but not for vision ones.
Thanks for the vid, btw👍

my_master

Are multiple heads just used to get better performance, like an ensemble?

its_fergi

Didnt get anything horrible explanation

karlheifisch

join shbcf.ru