OpenAI GPT-3: Language Models are Few-Shot Learners

Показать описание

**ERRATA**: Open AI/GPT-3 DOES NOT USE Microsoft's ZeRO/DeepSpeed for training

In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten discuss their takeaways from OpenAI’s GPT-3 language model. OpenAI trained a 175 BILLION parameter autoregressive language model. The paper demonstrates how self-supervised language modelling at this scale can perform many downstream tasks without fine-tuning.

00:00:00 Intro
00:00:54 ZeRO1+2 (model + Data parallelism) [GPT-3 DOES *NOT* USE THIS] (Connor)
00:03:17 Recent history of NLP (Tim)
00:06:04 Yannic "Light-speed" Kilcher's brief overview of GPT-3
00:14:25 Reviewing Yannic's YT comments on his GPT-3 video (Tim)
00:20:26 Main show intro
00:23:03 Is GPT-3 reasoning?
00:28:15 Architecture discussion and autoregressive (GPT*) vs denoising autoencoder (BERT)
00:36:18 Utility of GPT-3 in industry
00:43:03 Can GPT-3 do math? (reasoning/system 1/system 2)
00:51:03 Generalisation
00:56:48 Esoterics of language models
00:58:46 Architectural trade-offs
01:07:37 Memorization machines and intepretability
01:17:16 Nearest neighbour probes / watermarks
01:20:03 YouTube comments on GPT-3 video
01:21:50 GPT-3 news article generation issue
01:27:36 Sampling data for language models / bias / fairness / politics
01:51:12 Outro

These paradigms of task adaptation are divided into zero, one, and few shot learning. Zero-shot learning is a very extreme case where we expect a language model to perform a task such as sentiment classification or extractive question answering, without any additional supervision. One and Few-shot learning provide some examples to the model. However, GPT-3s definition of this diverges a bit from the conventional literature. GPT-3 provides one and few-shot examples in the form of “In-Context Learning”. Instead of fine-tuning the model on a few examples, the model has to use the input to infer the downstream task. For example, the GPT-3 transformer has an input sequence of 2048 tokens, so demonstrations of a task such as yelp sentiment reviews, would have to fit in this input sequence as well as the new review.

**ERRATA-continued** It has come to our attention that there was a serious factual error in our video -- GPT-3 DOES NOT USE Microsoft's ZeRO/ZeRO2 or DeepSpeed for training and there is no reference to this in either their blog post or paper. We are really sorry about this mistake and will be more careful to fact-check in future.

Thanks for watching! Please Subscribe!

Paper Links:

#machinelearning #naturallanguageprocessing #deeplearning #gpt3

Рекомендации по теме

Комментарии

ERRATA: Sorry for the mixup, GPT-3 does not actually use ZeRO or DeepSpeed!

MachineLearningStreetTalk

Thank you for a great video - love your editing and clarity of explanation.

PrzemekChojeckiAI

Informative discussion guys - thank you! Really liked the discussion on the age of the data making up the corpus - hadn't thought about this before :)

eddiesagra

00:00:00 Intro
00:00:54 ZeR01 +2 (model + Data parallelism) [GPT-3
DOES *NOT* USE THIS] (Connor)
00:03:17 Recent history of NLP (Tim)
00:06:04 Yannic "Light-speed" Kilcher's brief
overview of GPT-3
00:14:25 Reviewing Yannic's YT comments on his
GPT-3 video (Tim)
00:20:26 Main show intro
00:23:03 Is GPT-3 reasoning?
00:28:15 Architecture discussion and autoregressive
(GPT*) vs denoising autoencoder (BERT)
00:36:18 Utility of GPT-3 in industry
00:43:03 Can GPT-3 do math? (reasoning/system 1/
system 2)
00:51:03 Generalisation
00:56:48 Esoterics of language models
00:58:46 Architectural trade-offs
01:07:37 Memorization machines and intepretability
01:17:16 Nearest neighbour probes/ watermarks
01:20:03 YouTube comments on GPT-3 video
01:21:50 GPT-3 news article generation issue
01:27:36 Sampling data for language models/ bias/
fairness/ politics
01:51:12 Outro

eternalsecretforgettingfor

The review on GPT-3 along with a push in subscriptions owing to the recent popular paper reviews such as ResNet, Word2Vec, etc. (Plus years of hard-work) have made @Yannic an overnight star :) .

LNJP

I understood like 5% of what you said but my brain is slowly converging to understand it better and better :D Thanks for your video! Will watch sequences of it as my new Netflix & Think practice.

JousefM

re:Utility of GPT-3 in industry, on the topics of knowledge mining, regardless of the model used for inference, I'm not sure there is a good way for data sensitivity classification yet? Without a good data protection mechanism, perhaps few would start pouring all documents into any system based on them. Further, I think just like adversarial attacks on face recognition algo, perhaps we could also see attempts on fooling GPT3 using specially crafted phrases?

iuhh

Ok if this guy removes his “top gun” glasses may be we can get more of whats he really want to say

ctpact

Was not expecting the Arnold clip. I lol'd.

snippletrap

No guest!!! Yet, it's interesting.
How true is that training GPT-3 costed them

vinayreddy

Interesting debates. It would be useful if you linked the other articles showed during the episode.

fabmilo

I am in agreement that model recites from memory. I test text generation to write a 'story.' Each yarn spun by these models can actually be located to an existing book.
Albeit, the model is really powerful when used to automate comprehension, classification and extraction tasks. The worth of language models in these tasks is essence of 'no code', especially GPT. You can teach the model on a task like you would teach a 9 year old using just English syantax.

gibreel

You mentioned question answering, as opposed to the typical question asking. From an education perspective this is an important shift from consumerism to production. Something I'm interested in is the capacity of these models be be tuned on down stream task that can ask meaningful question about arbitrary input text to enhance a human leaner compression and recite salient facts or even concepts useful to their filed of study. Imagine using BERT tuned on question/answer pairs to support a learner's journey to internalizing essential facts and knowledge that elevate them to a level of reasoning about the acquired knowledge. Could this be a natural collaboration rather rather some dichotomous competition.

duxoroxor

Great video. Now I don't see gpt3 usefull for knowledge mining. I feel my hands are tied if I wanted to fine-tune the model to my NLP task. I would prefer Bert in that mather

crimythebold

This is great. I was wondering, what is the software used in 3:35? Neat visualization

imranq

can we use it? from GPT-3 import tockenizer ....?

bryancc

I've heard that the cerebellum of the brain learns "small programs" so as to execute them fast and sort of automatically. The 'reasoning' part of our brain creates/distills those programs and passes them to the cerebellum so it seems we need to invent the reasoning system. What's interesting is that a human can live a "normal" life without the cerebellum but they can't execute these automatic tasks fast (which of course is terrible) but they are functional

ikoukas

Indeed, we shouldn't think the computational capacity in this era is special. Something that out of the computational capacity may make some magic!

anonymous

can gpt-3 be used to use their embeddings for topic modeling?

monart

The unscrambling task confused me a bit. I mean if you scramble a word how can you be sure such a "scrambled" word would be in the vocabulary in order to assign a token (number) to such word and could approach the task?
Maybe I am confused but as far as I understand, each word has a token representation in the language model and such tokenization comes from the training set, doesn't it?
Thanks!
Amazing videos and discussions!

dariodemattiesreyes

OpenAI GPT-3: Language Models are Few-Shot Learners

OpenAI GPT-3: Language Models are Few-Shot Learners

GPT-3: Language Models are Few-Shot Learners (Paper Explained)

How Large Language Models Work

High-Level Tutorial of OpenAI's GPT-3 | GPT-3 VS BERT Family of NLP Models

OpenAI API (GPT-3) is magical...

OpenAI's Language Generator: GPT | The first AI Generating Text, Code, Websites...

GPT 3 Demo and Explanation - An AI revolution from OpenAI

Steve Omohundro on GPT-3

Llama 3.1 405B & 70B | Stable & Easy-to-use API for FREE

How ChatGPT Works Technically | ChatGPT Architecture

GPT-3 and the OpenAI API: First Impressions - Dabble Lab #246

How do I work with Large Language Models (OpenAI GPT-4)

Azure OpenAI - GPT-3 and Bot Framework #openai #chatgpt #gpt3

GPT-3 Explained | What is GPT-3 | OpenAI GPT-3 | GPT 3 Tutorial | GPT 3 Demo | Edureka

HOW WILL OPEN AI IMPROVE GPT-3 | GPT-4 | Google’s BERT the first large language model

Understanding ChatGPT/OpenAI Tokens

OpenAI's GPT-2 Explained | Visualizing Transformer Language Models | Generative Pre-Training | ...

How to integrate OpenAI GPT3 with a Databases - Crash Course

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

Using ChatGPT with YOUR OWN Data. This is magical. (LangChain OpenAI API)

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Google LaMDA 2 vs. OpenAI GPT 3: (Watch the AI Demos)

OpenAI GPT-4: THE SECRET PROMPT You Need To Know 🤐 #shorts

Fine-Tune OpenAI GPT-3 Davinci Model - JavaScript Walkthrough #shorts #javascript #chatgpt #openai