Self-Extend LLM: Upgrade your context length

preview_player
Показать описание
Self-Extend LLM: When LLMs encounter text sequences during inference - exceeding the length of their pre-training context window, we are faced with out-of-distribution (O.O.D) issues related to positional encoding.

Neural networks (NNs) and in particular LLMs are susceptible to unpredictable behaviors when dealing with O.O.D inputs. We analyse a new solution, to increase the context length of LLM during inference!

Introducing grouped self-attention, that extends the classical self-attention of transformers outside of their pre-trained context length!

All rights w/ authors:
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

#python
#airesearch
#attention
Рекомендации по теме
Комментарии
Автор

This is what they claim, but does it actually work in reality?

pensiveintrovert
Автор

super sick video!! Thanks for sharing all of this information, i hope you can keep going I love the content :)
Would it be possible to share the LLM you trained on LLM knowledge? would be super useful

joebarhouch
Автор

can you make video when fine-tune what parameter can give best result i tried so many time finetune never got less then 1 validation loss and got 0.98 training loss but not good in validation loss my dataset size is 2K row it this too small for 7B model mistral 7B or may be i m doing something wrong

kamleshpaul
Автор

sincerely appreciate the deep dives. another awesome post i’m watching in a loop❤

s-informationatyourservi