Self-Alignment Instruction Backtranslation

Показать описание

Like 👍. Comment 💬. Subscribe 🟥.

hu-po

Рекомендации по теме

Комментарии

1. *Prior work:*
- *Raw Language Model:* Raw language models are trained to predict text, such as continuing a Wikipedia article.
- *Use of Reinforcement Learning:* OpenAI applied reinforcement learning with a small dataset of back-and-forth communication to enable the model to have a conversation with users.

2. *Creation of "Humpback Llama2 70B":*
- *Web Crawl:* Started from a web crawl to gather initial data.
- *Instruction Generation:* Utilized text from the crawl to generate instructions, such as synthesized questions for information in Wikipedia.
- *Iterative Process:* Employed an iterative process of language models to select good pairs of instructions and answers.
- *Data Augmentation:* Augmented the data with synthetic examples that still have a base in originally human-generated information.
- *Fine-Tuning:* Fine-tuning led to "Humpback Llama2 70B" outperforming "Llama2 70B" on the Alpaca leaderboard.

3. *Alpaca Leaderboard:*
- *Evaluation Criteria:* Evaluates generation quality on 805 prompts.
- *Comparison Method:* Compares the pairwise win rate against a reference model (text-davinci-003).
- *Judgment Basis:* Uses GPT-4 judgments for evaluation.

4. *Comparison with Distilled Models:*
- *Performance:* "Humpback Llama2 70B" did not perform as well as a distilled Llama2 model (Vicuna).
- *Advantages of Non-Distilled Models:*
- Often larger and more complex, capturing nuanced patterns.
- Trained on original data, avoiding biases or errors from relying on another model's predictions.

5. *Methodology Insights:*
- *Scalable Approach:* The paper proposes a scalable method to fine-tune large language models to follow instructions.
- *Self-Training Algorithm:* Utilizes an iterative self-training algorithm called instruction backtranslation, allowing models to improve their own performance.
- *Impact on Leaderboard:* The fine-tuned models outperform other non-distilled instruction-following models on the Alpaca leaderboard.

wolpumba

The didgeridoo is very nice soundcheck.

wolpumba

When Reinforced Self-Training (ReST) by DeepMind in the same spirit?

darshank

should be called "the recipe paper"

khalilsabri

Self-Alignment Instruction Backtranslation

Self-Alignment with Instruction BackTranslation

Humpback - Self-Alignment with Instruction Backtranslation - Overview

Self-Alignment Instruction Backtranslation

[Paper Review] Self-Alignment with Instruction Backtranslation

Self Alignment with Instruction Backtranslation （Meta 2023）

Boosting AI with Self Alignment and Instruction Backtranslation

What is Humpback-Llama2 ? || Self-Alignment with Instruction Backtranslation || Paper Analysis

Better Alignment with Instruction Back-and-Forth Translation

[QA] Better Alignment with Instruction Back-and-Forth Translation

SELF INSTRUCT Aligning Language Models with Self Generated Instructions Washington 2023

Self Instruct Aligning Language Model with Self Generated InstructionsArxiv 2022

Scaling Multi-Modal Generative AI with Luke Zettlemoyer - 650

LLMs | Instruction Tuning | Lec 12.2

Deep Dive Into How Self Rewarding Language Models Work

[#97] Humpback: Como podemos alinear modelos de lenguaje?

Can LLMs Create LLM training data?

Synthetic data: Anthropic’s CAI, from fine-tuning to pretraining, OpenAI’s Superalignment, tips, ......

Best Practices and Lessons Learned on Synthetic Data for Language Models | New Paper | LLMs

[CVPR 2021 VQA2VLN Tutorial] Generalizable VLN Methods

FLAME: Learning to Navigate with Multimodal LLM in Urban Environments - ArXiv:2408.11051

Efficient LLMs with more data & instruct based LLMs (KOR) #llm #gpt #chatgpt

[Paper Review] Instruction Tuning with GPT-4

[ASU Frontiers of V&L Seminar] Xin (Eric) Wang. February 22, 2021.

Best Practices and Lessons Learned on Synthetic Data for Language Models