Efficient Large Scale Language Modeling with Mixtures of Experts

Показать описание

Let's talk about efficient large-scale language modeling with a fascinating concept known as Mixtures of Experts. This intriguing approach will help us unlock the potential of our language models, so stick around to find out how!

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive
MoE language models scale in comparison
with dense models in a wide range of settings:
in- and out-of-domain language modeling,
zero- and few-shot priming, and full-shot finetuning. With the exception of fine-tuning, we
find MoEs to be substantially more compute
efficient. At more modest training budgets,
MoEs can match the performance of dense
models using ∼4 times less compute. This gap
narrows at scale, but our largest MoE model
(1.1T parameters) consistently outperforms a
compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies
greatly across tasks and domains, suggesting
that MoE and dense models generalize differently in ways that are worthy of future study.
We make our code and models publicly available for research use.

Рекомендации по теме

Efficient Large Scale Language Modeling with Mixtures of Experts

Efficient Large Scale Language Modeling with Mixtures of Experts

Efficient Large-Scale Language Model Training on GPU Clusters

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

RAS: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM - G. Perrotta

Efficient Large-Scale AI Workshop | Session 2: Training and inference efficiency

Efficient Large-Scale AI Workshop | Session 1: Skills acquisition and new capabilities

Sebastian Borgeaud - Efficient Training of Large Language Models @ UCL DARK

Knowledge Graphs Construction using Small Language Models

Efficient Large-Scale AI Workshop | Session 3: Aligning models with human intent

What does larger scale software development look like?

Miguel Martínez & Meriem Bendris - Building Large-scale Localized Language Models

The Best Free GPT4-like Model You Can Train So Far

AI can't cross this line and we don't know why.

Research talk: Computationally efficient large-scale AI

MIT CSAIL Explains: Large Language Models: Part 1

How to Build an LLM from Scratch | An Overview

Mixture-of-Experts and Trends in Large-Scale Language Modeling with Irwan Bello - #569

Exploiting Parallelism in Large Scale DL Model Training: From Chips to Systems to Algorithms

[1hr Talk] Intro to Large Language Models

IRonMAN: InterpRetable Incident Inspector Based ON Large-Scale Language Model and Association miNing

Large Language Models in Five Formulas

Efficient Training of Language Models to Fill in the Middle | Paper summary

Research talk: Resource-efficient learning for large pretrained models