Create a Large Language Model from Scratch with Python – Tutorial

Показать описание

Learn how to build your own large language model, from scratch. This course goes into the data handling, math, and transformers behind large language models. You will use Python.

✏️ Course developed by @elliotarledge

⭐️ Contents ⭐️
(0:00:00) Intro
(0:03:25) Install Libraries
(0:06:24) Pylzma build tools
(0:08:58) Jupyter Notebook
(0:12:11) Download wizard of oz
(0:14:51) Experimenting with text file
(0:17:58) Character-level tokenizer
(0:19:44) Types of tokenizers
(0:20:58) Tensors instead of Arrays
(0:22:37) Linear Algebra heads up
(0:23:29) Train and validation splits
(0:25:30) Premise of Bigram Model
(0:26:41) Inputs and Targets
(0:29:29) Inputs and Targets Implementation
(0:30:10) Batch size hyperparameter
(0:32:13) Switching from CPU to CUDA
(0:33:28) PyTorch Overview
(0:42:49) CPU vs GPU performance in PyTorch
(0:47:49) More PyTorch Functions
(1:06:03) Embedding Vectors
(1:11:33) Embedding Implementation
(1:13:06) Dot Product and Matrix Multiplication
(1:25:42) Matmul Implementation
(1:26:56) Int vs Float
(1:29:52) Recap and get_batch
(1:35:07) nnModule subclass
(1:37:05) Gradient Descent
(1:50:53) Logits and Reshaping
(1:59:28) Generate function and giving the model some context
(2:03:58) Logits Dimensionality
(2:05:17) Training loop + Optimizer + Zerograd explanation
(2:13:56) Optimizers Overview
(2:17:04) Applications of Optimizers
(2:18:11) Loss reporting + Train VS Eval mode
(2:32:54) Normalization Overview
(2:35:45) ReLU, Sigmoid, Tanh Activations
(2:45:15) Transformer and Self-Attention
(2:46:55) Transformer Architecture
(3:17:54) Building a GPT, not Transformer model
(3:19:46) Self-Attention Deep Dive
(3:25:05) GPT architecture
(3:27:07) Switching to Macbook
(3:31:42) Implementing Positional Encoding
(3:36:57) GPTLanguageModel initalization
(3:40:52) GPTLanguageModel forward pass
(3:46:56) Standard Deviation for model parameters
(4:00:50) Transformer Blocks
(4:04:54) FeedForward network
(4:07:53) Multi-head Attention
(4:12:49) Dot product attention
(4:19:43) Why we scale by 1/sqrt(dk)
(4:26:45) Sequential VS ModuleList Processing
(4:30:47) Overview Hyperparameters
(4:32:14) Fixing errors, refining
(4:34:01) Begin training
(4:35:46) OpenWebText download and Survey of LLMs paper
(4:37:56) How the dataloader/batch getter will have to change
(4:41:20) Extract corpus with winrar
(4:43:44) Python data extractor
(4:49:23) Adjusting for train and val splits
(4:57:55) Adding dataloader
(4:59:04) Training on OpenWebText
(5:02:22) Training works well, model loading/saving
(5:04:18) Pickling
(5:05:32) Fixing errors + GPU Memory in task manager
(5:14:05) Command line argument parsing
(5:18:11) Porting code to script
(5:22:04) Prompt: Completion feature + more errors
(5:24:23) nnModule inheritance + generation cropping
(5:27:54) Pretraining vs Finetuning
(5:33:07) R&D pointers
(5:44:38) Outro

🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan

--

freeCodeCamp.org

Рекомендации по теме

Комментарии

There goes my weekend. Thank you! This is absolutely amazing material! I’m 5 minutes in and already hooked.

JoeD

Highly appreciate that you guys make things like this for free!

codewithdevhindi

Anyone can learn hard concepts from this guy, cuz he turns hard concepts into easily understandable ones! Congrats for being a such amazing teacher!! Teaching is all about facilitating the process of learning, but somehow people tend to overcomplicate things so they can sound smarter than majority. This guy does exactly the opposite of it. Thank u and God bless!

andreramalho

Great content for free.
Wishing that this hits 1 million views soon.
Keep up the great work.

DataPulse

Great work as usual❤ can't wait to deep dive in it

AhmedKhaliet

Your patients and ability to explain in basic terms makes learning easy. Thank you, your efforts and willingness to share information is much appreciated.

adammonson

Yesterday I was searching for this, and today you dropped it! Great job people 🙌

ayushagrawal

Overall a good course. It's a bit rough around the edges, but if you are persistent or already somewhat knowledgeable you can get through it and come out a bit smarter on the other end. Worth the almost 6 hours.

besomewheredosomething

"in this course, you're going to learn a LOT of crazy stuff!" I knew this was going to be a good one!

jostafro

Awesome stuff! Congrats on being such a good teacher!

erfanshayegani

Just finished this course. One of the best courses out there to understand basic concepts of LLM. Can't believe this goldmine exists for free on YT

Karthik_Beatzzz

Thanks FCC I have been waiting this kind of Course. At last on this❤❤❤❤

samuraipiang

Finally someone explained transformers properly! Great job!

WasimAbdul_

I don't know how you did it.. but you somehow explained Transformers in a way this absolute python newbie could parse and understand. Great work and I'm glad I spent the time to follow along with this tutorial.

jonathanrebelo

Damn, this looks like a jewel. Will definitely look into it. Thank you for sharing this!

eck

Just started this tutorial and I'm sure it's going to be a great course. But a REQUEST -- For future tutorials, please use larger fonts / zoomed-in windows & terminals (just a wee-bit larger would help tremendously). After 30 mins of eye-strain, I started to get a headache. Also, a dark theme (as available in jupyter lab and vscode) would also help.

golmatol

At what point in the tutorial do you ask the LLM a question and it returns the Wizard of Oz text you trained in the model? I can't find it. I just want to see how well it worked. I've listened to 20 minutes at the start and will finish the whole thing, but curious if anyone knows. None of the chapters say something like "Final Testing to Show it Works."

JohnLauerGplus

46:00 np.multiply is elementwise multiplication which isn't comparable to dot product multiplication. To compare gpu to cpu you can use torch's @ multiply for both. Since the second two were not loaded to the gpu, they are computed by the cpu.

biddlea

Thanks FCC, thanks for your effort Elia!! Good job!

dantefrias

Thank you so much sharing this knowledge with us.

RealRex

Create a Large Language Model from Scratch with Python – Tutorial

Create a Large Language Model from Scratch with Python – Tutorial

How Large Language Models Work

How to Build an LLM from Scratch | An Overview

Build a Large Language Model AI Chatbot using Retrieval Augmented Generation

Introduction to large language models

Large Language Models from scratch

What are Large Language Models (LLMs)?

Train your own language model with nanoGPT | Let’s build a songwriter

How to use Cloudflare AI models and inference in Python with Jupyter Notebooks

Let's build GPT: from scratch, in code, spelled out.

[1hr Talk] Intro to Large Language Models

Training Your Own AI Model Is Not As Hard As You (Probably) Think

Should You Use Open Source Large Language Models?

MoveIt Studio - Creating Objectives with AI Large Language Model

Large Language Models (LLMs) - Everything You NEED To Know

How ChatGPT Works Technically | ChatGPT Architecture

Making Large Language Models Work For You

Why Large Language Models Hallucinate

How Chatbots and Large Language Models Work

How to Train Your Own Large Language Models

Let's build the GPT Tokenizer

What are Generative AI models?

Large Language Models Are Zero Shot Reasoners

LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners