LLM Jargons Explained: Part 3 - Sliding Window Attention

preview_player

Показать описание

In this video, I thoroughly explore Sliding Window Attention (SWA), a technique employed to train Large Language Models (LLMs) effectively on longer documents. This concept was extensively discussed in the Longformer paper and has also been recently utilized by Mistral 7B, leading to reduced computational costs.

_______________________________________________________

_______________________________________________________
Follow me on:

Machine Learning Made Simple

Рекомендации по теме

Комментарии

0:28 what is the name of this parameter, the input token limit, phi_1, 2?

samson