Self-Extend LLM: Upgrade your context length

Показать описание

Self-Extend LLM: When LLMs encounter text sequences during inference - exceeding the length of their pre-training context window, we are faced with out-of-distribution (O.O.D) issues related to positional encoding.

Neural networks (NNs) and in particular LLMs are susceptible to unpredictable behaviors when dealing with O.O.D inputs. We analyse a new solution, to increase the context length of LLM during inference!

Introducing grouped self-attention, that extends the classical self-attention of transformers outside of their pre-trained context length!

All rights w/ authors:
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

#python
#airesearch
#attention

Рекомендации по теме

Комментарии

This is what they claim, but does it actually work in reality?

pensiveintrovert

super sick video!! Thanks for sharing all of this information, i hope you can keep going I love the content :)
Would it be possible to share the LLM you trained on LLM knowledge? would be super useful

joebarhouch

can you make video when fine-tune what parameter can give best result i tried so many time finetune never got less then 1 validation loss and got 0.98 training loss but not good in validation loss my dataset size is 2K row it this too small for 7B model mistral 7B or may be i m doing something wrong

kamleshpaul

sincerely appreciate the deep dives. another awesome post i’m watching in a loop❤

s-informationatyourservi

Self-Extend LLM: Upgrade your context length

Self-Extend LLM: Upgrade your context length

Self-Extend LLM Context Window Without Tuning

Why Do LLM’s Have Context Limits? How Can We Increase the Context? ALiBi and Landmark Attention!

Contextual Retrieval with Any LLM: A Step-by-Step Guide

What is Prompt Tuning?

MemGPT 🧠 Giving AI Unlimited Prompt Size (Big Step Towards AGI?)

How to code long-context LLM: LongLoRA explained on LLama 2 100K

Build a Large Language Model AI Chatbot using Retrieval Augmented Generation

Multi Head Attention Part 1: Basics and Python code

Building Production-Ready RAG Applications: Jerry Liu

Fine Tune LLaMA 2 In FIVE MINUTES! - 'Perform 10x Better For My Use Case'

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Should You Use Open Source Large Language Models?

'I want Llama3 to perform 10x with my private knowledge' - Local Agentic RAG w/ llama3

A Survey of Techniques for Maximizing LLM Performance

LLM Context Length (input data directly) vs GPT-4 Plugins

Attention mechanism: Overview

Build Anything with Llama 3 Agents, Here’s How

5 wild new AI tools you can try right now

Pinecone Workshop: LLM Size Doesn't Matter — Context Does

New Prompt Achieves 🚀 900% Logic & Reasoning Improvement (GPT-4)

This Llama 3 is powerful and uncensored, let’s run it

Let's build GPT: from scratch, in code, spelled out.

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples