How to Build an LLM from Scratch | An Overview

preview_player
Показать описание

This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.

More Resources:

[4] arXiv:2005.14165 [cs.CL]
[6] arXiv:2101.00027 [cs.CL]
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[13] arXiv:1706.03762 [cs.CL]
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]

--

Socials

The Data Entrepreneurs

Support ❤️

Intro - 0:00
How much does it cost? - 1:30
4 Key Steps - 3:55
Step 1: Data Curation - 4:19
1.1: Data Sources - 5:31
1.2: Data Diversity - 7:45
1.3: Data Preparation - 9:06
Step 2: Model Architecture (Transformers) - 13:17
2.1: 3 Types of Transformers - 15:13
2.2: Other Design Choices - 18:27
2.3: How big do I make it? - 22:45
Step 3: Training at Scale - 24:20
3.1: Training Stability - 26:52
3.2: Hyperparameters - 28:06
Step 4: Evaluation - 29:14
4.1: Multiple-choice Tasks - 30:22
4.2: Open-ended Tasks - 32:59
What's next? - 34:31
Рекомендации по теме
Комментарии
Автор

[Correction at 15:00]: words on vertical axis are backward. It should go "I hit ball with baseball bat" from top to bottom not bottom to top.


--
References
[4] arXiv:2005.14165 [cs.CL]
[6] arXiv:2101.00027 [cs.CL]
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[13] arXiv:1706.03762 [cs.CL]
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]

ShawhinTalebi
Автор

This is a about as perfect a coverage of this topic as I could imagine. I'm a researcher with a PhD in NLP who trains LLMs from scratch for a living and often find myself in need of communicating the process in a way that's digestible to a broad audience without back and forth question answering, so I'm thrilled to have found your piece!

As an aside, I think the token order on the y-axis of the attention mask for decoders on slide 10 is reversed

seanwilner
Автор

"Garbage in, garbage out" is also applicable to our brain. Your videos are certainly high quality inputs.

LudovicCarceles
Автор

All the series on using large language models (LLMs) are really very helpful. This 6th article, really helps me to understand in a nutshell the transformer architecture. Thank you. 👏

racunars
Автор

Hey Shaw - Thank you for coming up with this extensive video on building LLM from Scratch, it certainly gives a fair idea on, how some of the existing LLMs were created !

shilpyjain
Автор

Pretty rare that I actually sit through an entire 30+ minute video on youtube. Well done.

barclayiversen
Автор

This is literally the perfect explanation for this topic. Thank you so much.

dauntlessRx
Автор

Thank you so much for putting these videos together and this one in particular. This is such a broad and complex topic and you have managed to make it as thorough as possible in 30ish minute😮 timeframe which I thought was almost impossible.

GBangalore
Автор

Your voice is relaxing.. I love that you don't speak super fast like most tech bros... And you seem relaxed about the content rather than having this "in a rush" energy. def would watch you explain most things LLM and AI! Thanks for the content.

Hello_kitty_
Автор

I became interested in creating an LLM and this is the first video I opened. I am so greatful for it because I see I will never be able to do it on my own. I don't jave the money of resources. Thank you for the high level overview.

mater
Автор

This is such a fantastic video on building LLMs from scratch. I'll watch it repeatedly to implement it for a time-series use case. Thank you so much!!

tehreemsyed
Автор

This was a very thorough introduction to LLMs and answered many questions I had. Thank you.

bradstudio
Автор

I am typing this after watching half of the video as I am already amazed with the clarity of explanation. exceptional.

mujeebrahman
Автор

One of the best videos explaining the process and cost to build LLM🎉.

asha
Автор

Thanks for putting together this short video. I enjoy learning this subject from you.

ethanchong
Автор

This is excellent - thanks for putting this together and taking the time to explain things so clearly!

theunconventionalenglishman
Автор

This is the most comprehensive and well rounded presentation I've ever seen in my life, topic aside. xD Bravo, good Sir.

goldholder
Автор

clicked with low expectation, but wow what a gem. Great clarity with just the right amount of depth for beginners and intermediate learners.

lihanou
Автор

I am not a programmer or now anything about programming or LLMs but I find this topic fascinating. Thank you for your videos and sharing your knowledge.

sinan
Автор

That was simply incredible, how the heck does it have under 5k views. Literal in-script citations, not even cards but vocal mentions!! Holy shit im gonna share this channel with all my LLM enamored buddies

chrstfer