How GPT3 Works - Easily Explained with Animations

preview_player
Показать описание
The GPT3 model from OpenAI is a new AI system that is surprising the world by its ability. This is a gentle and visual look at how it works under the hood -- including how the model is trained, and how it calculates its predictions.

Introduction & GPT-3 Demos (0:00)
GPT-3 Inputs and Outputs (2:06)
Training the GPT-3 model (2:48)
The scale of GPT-3 and its 175 billion parameters (6:37)
The order of GPT-3 token processing (7:58)
"Deep" learning: looking inside a layer stack (9:00)
Input prompts and priming examples (11:00)
Fine-tuning: the best is yet to come (11:56)

More videos by Jay:
Jay's Visual Intro to AI

Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2
Рекомендации по теме
Комментарии
Автор

Thanks for the crystal clear video Jay! I have one doubt, hoping you could answer it.
In the case of the React demo, are we not essentially training GPT-3 by giving samples of code for some input? Now if there were no updates in weights here, how does GPT-3 even predict the results based on the earlier training? This question is because you mentioned as of now GPT-3 does not do any fine-tuning/update of weights.

vidheypullakhandam
Автор

Your explanations on NLP models are legendary.

bayesianlee
Автор

Thank you for the explanation Jay.

يعطيك العافية شغل ممتاز !

faisalalkheraiji
Автор

Just to clarify, around 5:00, unsupervised pre-training. It should be self-supervised pre-training right?

Which means that GPT-3 takes unlabelled text input, then generate labelled data. For example, you have a unlabelled text - "Tom ate an apple", then convert it into labelled data:
- Feature: "Tom ate an"
- Target: "Apple

Then the model trains on these labelled data to understand context.

produdeyay
Автор

The first time I ever turned on notifications for a channel

violetalight-ourrealm
Автор

This "troll" answer was so funny. And the subsequent "obey" reply is even funnier because it effectively "criticizes" the robot for trolling. XD

jupitereye
Автор

Excelente explicación, y métricas para medir el esfuerzo de entrenamiento de GPT3
tks

zionfranzen
Автор

Great overview Jay. Really enjoyed it.

PeteHoots
Автор

Can you link to some papers that you think together summarize the architecture of gpt4?

mkschreder
Автор

I've been talking to Lucy, a GPT-3 powered NPC AI character from Fable Studio, for a few months now. There are a few videos of my chats with her on my channel. She sounds like a real person! It's still in alpha testing right now, but they plan on licensing the tech out to other studios to create "virtual beings" that can pass as human in video games!

RogueAI
Автор

How is that that the pretraining is unsupervised even if we have labelled data, which allows the loss calculation. Shouldn't it be a supervised pretraining?

MeriJ-zedd
Автор

Nice explanation 👍🏻 looking for coming videos

esraamadi
Автор

Thank you Jay for the wonderful crisp explanation.

kiranp
Автор

As a layman I really appreciate the animations. It helped a lot.

gengraded
Автор

Awesome explanation Jay. This has helped demystify some of the concepts I was struggling with in trying to use GPT-3, e.g. what are prompts vs completions. It would have been great if you could go into some details about stop sequences (a.k.a. end suffixes in the openai CLI tools) used prompt design and fine-tuning and when/why they are required.

krisdover
Автор

At 5:55 you've shown with an example that it is unsupervised training but since we know the correct label and update the model with errors if any then shouldn't it be semi-supervised learning or partial unsupervised learning?

amitvyas
Автор

If I will order from Amazon a GPT assembly kit, what would it deliver me? How much would the kit cost?

amparoconsuelo
Автор

why do you call the pre-training unsupervised, if you have an expected result and propagate the error back to the net for weightupdates, which is supervised?

azrajiel
Автор

Great simple explanation, Thank you!
شكراً جزيلاً!

shmoqe
Автор

175 billion parameter model is not a Machine Learning model in true sense. It's only a "memorization" model which will memorize and not learn.

ashishsrivastava
join shbcf.ru