How to code long-context LLM: LongLoRA explained on LLama 2 100K

Показать описание

Code and theory how to fine-tune for long-context LLM, like LLama-2 100K.
Long sequence LLM are important for a long scientific article with more than 32K or 64K tokens. 3 days ago a new tech for creating long sequence LLM has been published, which looks finally usable (at first glance): Long LoRA.
Plus optimized for Flash Attention2.

Claude 100K, ChatGPT 32K, LLama2 100K, etc .... create long sequence LLMs.

LongLoRA explained in detailed and the code to extend your LLM to higher context length.

#ai
#coding
#explanation

code_your_own_AI

Рекомендации по теме

Комментарии

Thanks a lot!!! Great videos with indepth explanation.Please share the Presentations as well for reference.

sandhyas

Extremely interesting Thanks, The context length is a real limitation for now

jmirodg

Amazing video!
I had 1 doubt regarding the Shift Short Attention though and was hoping that you could clarify. Consider that we have 8000 sequence length and we divide it into 4 groups of 2000 length each. Group 1 contains tokens from 1 to 2000, group 2 contains the next 2000 tokens and so on.
The way I understand S2-Attention is that if we have 4 heads for each group, we keep 2 heads as it is and for the other 2 heads we shift them by half the group size. So for these 2 shifted heads, 1st row (for group 1) will not contain information about the 1st token but that information will be present in the row number 1001. The 1st row will instead contain information about the token number 7001. Similarly 1st row in these shifted heads (for group 2) will contain information about token number 1001.
Once we have shifted these 2 heads, we can continue with the original self attention mechanism for each of these groups independently where we concatenate these heads and produce input for the next layer.
Am I correct or is there an error in my understanding?

sambhavkhur

Do you see feasible a simple implementation for a "small"model within a notebook?

sergialbert

Have any one tried to conduct an experiment to inspect the lost in the middle effect? I wonder if the S2 attention mechanism affects how the model utilizes the results of attention.

AzureUnagi

are there any results on the long context learning, like test results?

comediansguidetotruecrime

Hey, do you have an idea how I could use Screenshots in the documents I want to use in a RAG-Usecase? Thanks

fabianaltendorfer

Tnks for your slow speaking eng. im korean, im not well eng. but i do little understanding
😊😊

gunwooo

Lots of errors in trying out the fine-tuning code. Please try it out and see

jdoejdoe

How to code long-context LLM: LongLoRA explained on LLama 2 100K

How to code long-context LLM: LongLoRA explained on LLama 2 100K

Ep 5. How to Overcome LLM Context Window Limitations

Self-Extend LLM: Upgrade your context length

Why Do LLM’s Have Context Limits? How Can We Increase the Context? ALiBi and Landmark Attention!

How Language Models Use Long Contexts (LLM)?

Really Long Context LLMs - 200k input tokens

How Does Rag Work? - Vector Database and LLMs #datascience #naturallanguageprocessing #llm #gpt

Topic 5 What are Large Language Models in Artificial Intelligence?

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Instruction Fine-Tuning and In-Context Learning of LLM (w/ Symbols)

How Context Length of LLM is Increased by Adjusting RoPE Theta

LongLoRA and LongAlpaca for Long context LLMs

Chunk large complex PDFs to summarize using LLM

5 Levels Of LLM Summarizing: Novice to Expert

What is Prompt Tuning?

Introduction to large language models

LangChain - Conversations with Memory (explanation & code walkthrough)

First local LLM to Beat GPT-4 on Coding | Codellama-70B

PR-460: LongLoRA for Long Context LLM

LLM Explained | What is LLM

How much does a UI/UX DESIGNER make?

LLM In-Context Learning Masterclass feat My (r/reddit) AI Agent

LLM Context Length (input data directly) vs GPT-4 Plugins

Stability AI launches StableCode, an LLM for code generation