LLM Jargons Explained: Part 3 - Sliding Window Attention

preview_player
Показать описание
In this video, I thoroughly explore Sliding Window Attention (SWA), a technique employed to train Large Language Models (LLMs) effectively on longer documents. This concept was extensively discussed in the Longformer paper and has also been recently utilized by Mistral 7B, leading to reduced computational costs.

_______________________________________________________

_______________________________________________________
Follow me on:

Рекомендации по теме
Комментарии
Автор

0:28 what is the name of this parameter, the input token limit, phi_1, 2?

samson