Extending Context Window of Large Language Models via Positional Interpolation Explained

preview_player
Показать описание

Рекомендации по теме
Комментарии
Автор

man I just love you for sharing and such easy to understand explanation

pengbo
Автор

Interesting paper, I am having similar problem while training on 300 long sequences I need to extend to 1000 and I am using RoPE. Do you know if this interpolation can be used with RoPE, or should I look into something like ALiBI ? I recall I was reading ALiBi also has some issues and accuracy is worse. There is also LongRoPE.

mateuszk
Автор

Thank you for another great video! 🙏
Does this also work for ALiBi?

Skinishh
Автор

Amazing explanation, I am thinking u r doing a PhD. do u have any idea how we can implement this method in code to finetune llama2? any resource is appreciated.

kibrutemesgen
Автор

Question : The model can still take only 2048 tokens, so we still have to chunk a 4096 tokens in two blocks, right? PI only deals with modifying the positional embedding. It cannot help with the fact that the attention is still on a window of 2048 tokens.

HarisJabbar