Improving Language Models by Retrieving from Trillions of Tokens | NLP Journal Club

Показать описание

Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

The NLP Lab

Рекомендации по теме

Комментарии

I think there is a mistake in your explaination. They are not freezing the Transformer Encoder you show in the image, rather another BERT encoder that is used to fetch neighbours.

RohitKumarSingh

hi, , can i get your mailid, i want to run the github repository of it.

GauravKumar-ghfz

Improving Language Models by Retrieving from Trillions of Tokens | NLP Journal Club

Improving Language Models by Retrieving from Trillions of Tokens | NLP Journal Club

RETRO: Improving language models by retrieving from trillions of tokens

RETRO: Improving Language Models by Retrieving from Trillions of Tokens

The Illustrated Retrieval Transformer

Stanford CS25: V3 I Retrieval Augmented Language Models

How Large Language Models Work

[Paper Review] Improving Language Models by Retrieving from Trillions of Tokens

Experience Grounds Language: Improving language models beyond the world of text

Building with Small Language Models (SLMs)

PR-379: Improving language models by retrieving from trillions of tokens

New Prompt Achieves 🚀 900% Logic & Reasoning Improvement (GPT-4)

Ofir Press | Complementing Scale: Novel Guidance Methods for Improving Language Models

Learning to Retrieve In-Context Examples for Large Language Models

Feed Your OWN Documents to a Local Large Language Model!

GPT-1 | Paper Explained & PyTorch Implementation

#100 Dr. PATRICK LEWIS - Retrieval Augmented Generation

[QA] Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

Are Bigger Language Models Better? | DeepMind Gopher and RETRO

How to Answer Any Question on a Test

Retrieval-Augmented Generation (RAG) | Improve the performance of large language models (LLMs)

ColPali: Bringing Vision Language Models to Document Retrieval

Building Better Large Language Models - Key Concepts for Prompting and Fine Tuning

[1hr Talk] Intro to Large Language Models

Andrew Ng's Secret to Mastering Machine Learning - Part 1 #shorts