Large Language Models (in 2023)

Показать описание

I gave a talk at Seoul National University.

I titled the talk “Large Language Models (in 2023)”. This was an ambitious attempt to summarize our exploding field.

Trying to summarize the field forced me to think about what really matters in the field. While scaling undeniably stands out, its far-reaching implications are more nuanced. I share my thoughts on scaling from three angles:

1:02 1) Change in perspective is necessary because some abilities only emerge at a certain scale. Even if some abilities don’t work with the current generation LLMs, we should not claim that it doesn’t work. Rather, we should think it doesn’t work yet. Once larger models are available many conclusions change.

This also means that some conclusions from the past are invalidated and we need to constantly unlearn intuitions built on top of such ideas.

7:12 2) From first-principles, scaling up the Transformer amounts to efficiently doing matrix multiplications with many, many machines. I see many researchers in the field of LLM who are not familiar with how scaling is actually done. This section is targeted for technical audiences who want to understand what it means to train large models.

27:52 3) I talk about what we should think about for further scaling (think 10000x GPT-4 scale). To me scaling isn’t just doing the same thing with more machines. It entails finding the inductive bias that is the bottleneck in further scaling.

I believe that the maximum likelihood objective function is the bottleneck in achieving the scale of 10000x GPT-4 level. Learning the objective function with an expressive neural net is the next paradigm that is a lot more scalable. With the compute cost going down exponentially, scalable methods eventually win. Don’t compete with that.

In all of these sections, I strive to describe everything from first-principles. In an extremely fast moving field like LLM, no one can keep up. I believe that understanding the core ideas by deriving from first-principles is the only scalable approach.

Disclaimer: I give my personal opinions and the talk material doesn't reflect my employer's opinion in any way.

Hyung Won Chung

Рекомендации по теме

Комментарии

Here after O1 preview release 😊 I’m so super excited for what’s happening in the world . This is a new frontier for me and I think for the world 😊 this video is very well done ! Thank you so much I will definitely follow and keep up with your work

TylorVetor

Wonderful! Super insightful talk. Loved the part where you simplify what scalability actually means.

mrin

Thank you so much, nicely put and easy to understand

maqboolurrahimkhan

Best video on LLMs I've ever seen! The section starting around 29:50 is fascinating, so a pre-trained model that hasn't undergone post-training will happily work with malicious prompts. So it would be incredibly dangerous is a model that has been pre-trained but not undergone post-training ever leaked

anveio

I'm LiuShuai.I have told my brother who is a middle school graduate that he and I could been both sitting as scientists, but he don't believe me.And leave me with a mocking expression behind.
I will prove what I said and what Hyung Won Chung said in the video.

刘帅-ns

The 3 parts are not strongly related. You can choose the ones you are interested in.
1- Set of emergent behaviors will change with scaling. In future problems that current models fail will be solved.
2- A short look on how transformer training is scaled in data centers.
3- Maximum likelihood introduces a strong bias by assuming only a single answer. We need better learning objective functions that learn parameters e.g. RLHF and beyond.

"Not yet"
for instance if P != NP is proven, then some tasks for sure would never be "yet".
How do you distinguish between tasks where there is a chance versus tasks where it has been proven no chance?

MrHardgabi

Awesome talk! Loved hearing your insight and counter-perspective of current researcher sentiment that we should get rid of the RL in RLHF.

josephedappully

This is great talk. thank you so much
I have question about 'mapping' . I'm still confused what's exactly mapping is in deep learning
So I guess mapping is sort of a process which is transformation of data into the dimension or manifold, data is involved?

bayesianlee

as new phd student in the filed ? what area do you suggest that you think it is interesting to dig in ?

NA-sdbw

The term "Large" LM is equivalent to "Big" Data of the foregone era.

semrana

love that you used the example of reward hacking of "just prefering longer responses", average response length is one of the most common differences i've seen when trying different snapshots of chatGPT/GPT-4. On another note, would love to hear your thoughts on soft prompt tuning for model steering (instead of full model gradient updates from RLHF)

sergicastellasape

Exceptional content. A useful insight into what’s meant by scalability in a concrete sense.

JustinHalford

I think you forgot to mention the option of the model never has the ability.

GerardSans

Enjoyed the talk, thanks for putting it together and posting it here!

iandanforth

Large Language Models (in 2023)

Large Language Models (in 2023)

How Large Language Models Work

Introduction to large language models

[1hr Talk] Intro to Large Language Models

What are Large Language Models (LLMs)?

Why Large Language Models Hallucinate

Top 12 Best Large Language Models (LLMs) in 2023 | Generative AI | GPT4 ChatGPT | Google Bard LLaMA

Large Language Models(LLMs) - The Ultimate Comparison 2023

How AI is Transforming Healthcare with Kate Brown at HLTH 2023

Large Language Models (LLMs) - Everything You NEED To Know

What are Generative AI models?

Risks of Large Language Models (LLM)

What is Retrieval-Augmented Generation (RAG)?

Large Language Models and The End of Programming - CS50 Tech Talk with Dr. Matt Welsh

Wie werden Large Language Models (LLMs) wie ChatGPT trainiert?

Should You Use Open Source Large Language Models?

A Practical Introduction to Large Language Models (LLMs)

LLM Explained | What is LLM

Jason Wei -- Emergence in Large Language Models (March 1st 2023)

Introduction to Large Language Models:Quiz

What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata

Almost Timely News: How Large Language Models Work (2023-10-01)

Top 12 Large Language Models (LLMs) in 2023 | GPT4 vs GPT3.5 | ChatGPT | Bard AI | LLaMA by Meta

Grundlagen von ChatGPT und anderen Large Language Models