Sliding Window Attention (Longformer) Explained

preview_player
Показать описание
In this video we talk about the sliding window attention, the diluted sliding window attention and the global+sliding window attention, as introduced in the Longformer paper. We take a look at the main disadvantage of the classical attention mechanism introduced in the Transformer paper (i.e. the quadratic time complexity) and how the sliding window attention proposes to solves this issue.

*References*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

*Related Videos*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

*Contents*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro
00:26 - Original attention mechanism
00:50 - Sliding window attention
01:56 - Dilated sliding window attention
02:40 - Global + Sliding window attention
03:31 - Outro

*Follow Me*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

*Channel Support*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)

If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a

#slidingwindowattention #longformer #attentionmechanism
Рекомендации по теме
Комментарии
Автор

this was super helpful, thanks so much!

mutantrabbit
welcome to shbcf.ru