Breaking Down Meta's Billion Dollar LLM Blueprint [Llama-3.1 Full Breakdown]

preview_player
Показать описание

check out my newsletter:

Llama-3.1's 92 page paper is an engineering paper that most people wouldn't nearly care as much, but would be seen as the goldmine paper of LLM for any AI developers. Why is that? Let's find out what Meta researchers shared how an chungus of a model is trained and optimized.

Llama-3.1 405B

This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford

[Music 1] massobeats - gingersweet
[Music 2] massobeats - lush

[Video Editor] Silas

0:00 Intro
2:52 model architecture
4:50 scaling law
6:39 compute & hardware optimization
10:24 Poe
11:48 Training recipe
17:49 Data mix
Рекомендации по теме
Комментарии
Автор

and probs no more 20 mins vid from me it's literally death itself to record it

bycloudAI
Автор

this video really wanna makes me read the whole paper, rare to see a company publish such a detailed paper

erenplayzmc
Автор

A "multimodal" chatbot:
5 different models hot glued together

Napert
Автор

Karpathy in 5 years: Reproducing LLaMa 3.1 405B

YourAverageHuman-
Автор

54 days training and it reached GPT-4o 🤯
GPT-5 with X-trillion parameters is going to start it's own weight class of LLMs 😌

RedOneM
Автор

So glad this answered more questions than I ever thought even exist.

FunIsGoingOn
Автор

Wow that's one of epic tutorial Llama 3 Training RitualDifficulty: Deadhead
Rarity: Mythic
Minimum Level to Read Description: 80
Minimum Level to Embark: XXX (requires further enlightenment)

apoage
Автор

It's actually pretty cool that Poe sponsors you. They genuinely are what I recommend to anyone who wants to use LLM's.

pareak
Автор

i'm mad excited for llama 4 because multimodal

GraveUypo
Автор

06:08 The isoflops curve explanation was a mind-bender! Thanks for breaking it down.

The.AiSide
Автор

First time that an advertisement actually makes me return to a video and watch it again to find it.

Regardless of that, this was super helpful, thank you so much.😅

RicardoPoleo
Автор

new video dropped... * breathing heavy *

diga
Автор

It was an excellent video, but still I don't think the kids from 3:00 are gonna make it.

Hodoss
Автор

Great video, I'd love to see more of that. Even some more technical and also about multimodel models architecture

elwii
Автор

It's clear to me that llama4 will have MoA like GPT4o. It would be nice to see an image generator also integrated but let's not get ahead of ourselves. Let's hope that it would also be "open source" (although the current models aren't technically open source because you're not completely free do do whatever you want with this technology. Look it up)

dimii
Автор

This is an excellent breakdown of the paper. Thank you

sammcj
Автор

So I guess I'm gonna be stuck on that desert island then 😅

Ikbeneengeit
Автор

Damm, I need to invest in META. They will dominate standardization.

JohnDontFollowMe
Автор

wow this is amazing thanx very well received here.

matt-se
Автор

“how to build a nuke in less than 100 pages” - Meta

redthunder