Neural network architectures, scaling laws and transformers

Показать описание

A summary of research related to Neural Network Architecture design, Scaling Laws and Transformers.

Detailed description:
We first look at different strategies for neural network design: human design, random wiring, evolutionary algorithms and neural architecture search. As an example of neural architecture search, we explore the DARTS algorithm in some detail.

We then explore scaling phenomena and the role of hardware in modern machine learning. In particular, we look at the economic, hardware and algorithmic factors enabling scaling. We consider the implications of scaling and why this trend may be important.

Next, we look at the Transformer, a model that scales effectively with additional compute. We discuss the mechanics of how self-attention works and the encoder-decoder architecture that underpins the transformer.
We then describe transformer scaling laws for natural language, the application of the transformer architecture to vision applications, and the proliferation of transformer variants that have emerged in different domains.

Finally, we briefly discuss the relationship between neural network design and energy consumption, together with some estimates that have been made on the carbon emissions associated with model training.

Timestamps:
00:00 - Neural network architectures, scaling laws and transformers
00:22 - Outline
00:40 - Strategies for Neural Network Design
02:17 - Strategy 1: Neural Network Design by Hand
05:01 - Strategy 2: Random Wiring
08:01 - Strategy 3: Evolutionary Algorithms
10:55 - Strategy 4: Neural Architecture Search
12:11 - DARTS: Differentiable Architecture Search
16:04 - Scaling phenomena and the role of hardware
17:24 - What factors are enabling effective compute scaling?
18:34 - Scaling phenomena and the role of hardware (cont.)
20:28 - The Transformer: a model that scales particularly well
27:05 - Transformer scaling laws for natural language
28:41 - Vision Transformer
31:32 - Transformer Explosion
33:47 - Neural Network Design and Energy Consumption

Topics: #scalinglaws #transformers #neuralnetworks #machinelearning #computervision

Notes: The content is part of a set of lectures I gave as part of the 2021 4F12 Computer Vision course for undergraduate engineering at the University of Cambridge.

Рекомендации по теме

Комментарии

This video deserves a comment.
I just wanted to say that I appreciate the video as well as the custom, well-designed slides!

pw

Although I have watched a few "AI for dummies" videos, never quite understood it. This academic video however, made a lot of things click. Thank you Samuel!

neithanm

Super Great video ! Do you think with the scaling law and transformers, Neural Architecture Search is still relevant ?

Krikri

Neural network architectures, scaling laws and transformers

Neural network architectures, scaling laws and transformers

Explaining Neural Scaling Laws

Architectures Beyond CNNs and Visual Scaling Laws (Neil Houlsby) | Tutorial (1/3)

Neural Scaling Laws: how much more data we need?

Understanding the Origins and Taxonomy of Neural Scaling Laws

Neural Scaling Laws

Beyond neural scaling laws – Paper Explained

Studying Scaling Laws for Transformer Architecture … | Shola Oyedele | OpenAI Scholars Demo Day 2021...

10 minutes paper (episode 22); Beyond neural scaling laws

Stanford CS224N NLP with Deep Learning | Spring 2022 | Guest Lecture: Scaling Language Models

Lecture 7: Explaining Neural Scaling Laws

How big does a neural network need to be? | Oriol Vinyals and Lex Fridman

Scaling laws for large language models

Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman

Neural Scaling Laws and GPT-3

Training and generalization dynamics in simple deep networks

Adam Grzywaczewski | The scaling laws of AI Why neural networks continue to grow

How to Design a Neural Network | 2020 Edition

Neural Scaling Laws and GPT-3 - Jared Kaplan

8 Tips on How to Choose Neural Network Architecture

Scaling Laws for Large Language Models

A Common Misconception About Scaling Neural Network Inputs

Zhengkang Zhang: Structures of Neural Network Effective Theories

Vision Transformer Basics