How to Build an LLM from Scratch | An Overview

Показать описание

This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.

More Resources:

[4] arXiv:2005.14165 [cs.CL]
[6] arXiv:2101.00027 [cs.CL]
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[13] arXiv:1706.03762 [cs.CL]
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]

--

Socials

The Data Entrepreneurs

Support ❤️

Intro - 0:00
How much does it cost? - 1:30
4 Key Steps - 3:55
Step 1: Data Curation - 4:19
1.1: Data Sources - 5:31
1.2: Data Diversity - 7:45
1.3: Data Preparation - 9:06
Step 2: Model Architecture (Transformers) - 13:17
2.1: 3 Types of Transformers - 15:13
2.2: Other Design Choices - 18:27
2.3: How big do I make it? - 22:45
Step 3: Training at Scale - 24:20
3.1: Training Stability - 26:52
3.2: Hyperparameters - 28:06
Step 4: Evaluation - 29:14
4.1: Multiple-choice Tasks - 30:22
4.2: Open-ended Tasks - 32:59
What's next? - 34:31

Рекомендации по теме

Комментарии

[Correction at 15:00]: words on vertical axis are backward. It should go "I hit ball with baseball bat" from top to bottom not bottom to top.

--
References
[4] arXiv:2005.14165 [cs.CL]
[6] arXiv:2101.00027 [cs.CL]
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[13] arXiv:1706.03762 [cs.CL]
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]

ShawhinTalebi

This is a about as perfect a coverage of this topic as I could imagine. I'm a researcher with a PhD in NLP who trains LLMs from scratch for a living and often find myself in need of communicating the process in a way that's digestible to a broad audience without back and forth question answering, so I'm thrilled to have found your piece!

As an aside, I think the token order on the y-axis of the attention mask for decoders on slide 10 is reversed

seanwilner

"Garbage in, garbage out" is also applicable to our brain. Your videos are certainly high quality inputs.

LudovicCarceles

All the series on using large language models (LLMs) are really very helpful. This 6th article, really helps me to understand in a nutshell the transformer architecture. Thank you. 👏

racunars

Hey Shaw - Thank you for coming up with this extensive video on building LLM from Scratch, it certainly gives a fair idea on, how some of the existing LLMs were created !

shilpyjain

Pretty rare that I actually sit through an entire 30+ minute video on youtube. Well done.

barclayiversen

This is literally the perfect explanation for this topic. Thank you so much.

dauntlessRx

Thank you so much for putting these videos together and this one in particular. This is such a broad and complex topic and you have managed to make it as thorough as possible in 30ish minute😮 timeframe which I thought was almost impossible.

GBangalore

Your voice is relaxing.. I love that you don't speak super fast like most tech bros... And you seem relaxed about the content rather than having this "in a rush" energy. def would watch you explain most things LLM and AI! Thanks for the content.

Hello_kitty_

I became interested in creating an LLM and this is the first video I opened. I am so greatful for it because I see I will never be able to do it on my own. I don't jave the money of resources. Thank you for the high level overview.

mater

This is such a fantastic video on building LLMs from scratch. I'll watch it repeatedly to implement it for a time-series use case. Thank you so much!!

tehreemsyed

This was a very thorough introduction to LLMs and answered many questions I had. Thank you.

bradstudio

I am typing this after watching half of the video as I am already amazed with the clarity of explanation. exceptional.

mujeebrahman

One of the best videos explaining the process and cost to build LLM🎉.

asha

Thanks for putting together this short video. I enjoy learning this subject from you.

ethanchong

This is excellent - thanks for putting this together and taking the time to explain things so clearly!

theunconventionalenglishman

This is the most comprehensive and well rounded presentation I've ever seen in my life, topic aside. xD Bravo, good Sir.

goldholder

clicked with low expectation, but wow what a gem. Great clarity with just the right amount of depth for beginners and intermediate learners.

lihanou

I am not a programmer or now anything about programming or LLMs but I find this topic fascinating. Thank you for your videos and sharing your knowledge.

sinan

That was simply incredible, how the heck does it have under 5k views. Literal in-script citations, not even cards but vocal mentions!! Holy shit im gonna share this channel with all my LLM enamored buddies

chrstfer

How to Build an LLM from Scratch | An Overview

How to Build an LLM from Scratch | An Overview

How Large Language Models Work

Create a Large Language Model from Scratch with Python – Tutorial

Five Steps to Create a New AI Model

Build a Large Language Model AI Chatbot using Retrieval Augmented Generation

API For Open-Source Models 🔥 Easily Build With ANY Open-Source LLM

Let's build GPT: from scratch, in code, spelled out.

Training Your Own AI Model Is Not As Hard As You (Probably) Think

Modern Day LLM Application Architecture || Part 1 || Masterclass by @IntrnForte

Introduction to large language models

Run Your Own LLM Locally: LLaMa, Mistral & More

[1hr Talk] Intro to Large Language Models

Large Language Models from scratch

LLM Explained | What is LLM

How to Build LLMs on Your Company’s Data While on a Budget

Create Your Own LLM from Scratch Easily

LLM Project | End to End LLM Project Using LangChain, Google Palm In Ed-Tech Industry

Build Python LLM apps in minutes Using Chainlit ⚡️

Should You Use Open Source Large Language Models?

Developing an LLM: Building, Training, Finetuning

What are Large Language Models (LLMs)?

LLM Project | End to End LLM Project Using Langchain, OpenAI in Finance Domain

Train your own language model with nanoGPT | Let’s build a songwriter

Run your own AI (but private)