Developing an LLM: Building, Training, Finetuning

preview_player
Показать описание
REFERENCES:

DESCRIPTION:
This video provides an overview of the three stages of developing an LLM: Building, Training, and Finetuning. The focus is on explaining how LLMs work by describing how each step works.

---

---

---

OUTLINE:

00:00 – Using LLMs
02:50 – The stages of developing an LLM
05:26 – The dataset
10:15 – Generating multi-word outputs
12:30 – Tokenization
15:35 – Pretraining datasets
21:53 – LLM architecture
27:20 – Pretraining
35:21 – Classification finetuning
39:48 – Instruction finetuning
43:06 – Preference finetuning
46:04 – Evaluating LLMs
53:59 – Pretraining & finetuning rules of thumb
Рекомендации по теме
Комментарии
Автор

Your articles and videos have been extremely helpful in understanding how LLMs are built. Building LLM from Scratch and Q and AI are resources that I am presently reading and they provide a hands-on discourse on the conceptual understanding of LLMs. You, Andrej Karpathy and Jay Alammar are shining examples of how learning should be enabled. Thank you!

tusharganguli
Автор

You are the best! Thanks a lot for sharing your knowledge to the world.

adityasamalla
Автор

Thank you Sebastian for your awesome contributions. You're a big inspiration.

chineduezeofor
Автор

One of the best 60 minutes of my time. Really thankful for this..

kyokushinfighter
Автор

You are a true educator. Honored to be a contributor to one of your libraries.

admercs
Автор

I know you don't do many tutorials but personally I love theme especially from you!

JR-gylh
Автор

Thank you, Sir. Your lessons are beneficial for the community. Appreciate your hard work..!! 😊

haribhauhud
Автор

I am your fan, I have most of your books, thanks for this excellent video ! Another evaluation metric that I found interesting in another channel was to make the LLMs to play chess against each other 10 times.

guis
Автор

Very nice video, I liked it so much that I preordered your new book directly after watching it (to be fair I have read your blog for some time now).

tomhense
Автор

u r a LEGEND, luv ur work, thnx a ton for sharing!

bjugdbjk
Автор

What wonderful Tech Minds : { Sebastian Raschka, Yann LeCun, Andrej Karpathy, ...} who share their works and beautiful ideations for Mere mortal like me... Sebastian's teachings are so, so fundamental that takes fear off my clogged mind... 🙏
Although I am struggling to build LLMs for specific & niche areas, I am confidant of cracking them with great resources like : Build a Large Language Model (From Scratch)!!!

ZavierBanerjea
Автор

00:02 Three common ways of using large language models
02:39 Developing LLM involves building, pre-training, and fine-tuning.
07:11 LLM predicts the next token in the text
09:30 Training LLM involves sliding fixed size inputs over text data to create batches.
14:22 Byte pair encoding and sentence piece variations allow LLMs to handle unknown words
16:42 Training sets are increasing in size
21:09 Developing an LM involves architecture, pre-training, model evaluation, and fine-tuning.
23:14 The Transformer block is repeated multiple times in the architecture.
27:22 Pre-training creates the Foundation model for fine-tuning
29:28 Training LLMs typically done for one to two epochs
33:44 Pre-training is not usually necessary for adapting LLM for a certain task
35:51 Replace the output layer for efficient classification.
39:54 Classification fine-tuning is key for practical business tasks.
42:01 LLM instruction data set and preference tuning
45:58 Evaluating LLMs is crucial, with MML being a popular metric.
48:07 Multiple choice questions are not sufficient to measure an LM's performance
52:34 Comparing LLM models for performance evaluation
54:32 Continued pre-training is effective for instilling new knowledge in LLMs
58:28 Access slides on the website for more details

nithinma
Автор

Thanks for the detailed videos and articles. I want to ask if it's possible to create a customized tokenizer as an extension to existing ones for a custom dataset? Also, how do decoder-only models handle other tasks like summarization, and classification after fine-tuning without forgetting their causal pre-trained causal next token task?

moshoodolawale
Автор

Thanks for the great knowledge You are sharing <3

rachadlakis
Автор

Oh, my lord, my favourite machine learning author is a Liverpool fan.😎

haqiufreedeal
Автор

Hi, nice videos! One question for my understanding. When talking about embedding dimensions such as 1280 in "gpt2-large" do you mean the size of the number vector encoding the context of a single token or the number of input tokens? When comparing gpt2-large and Lama2 the number is the same for the ".. embeddings with 1280 tokens".

RobinSunCruiser
Автор

@16:37 when you say Llama was trained on 1T token, do you still mean there was 32K unique token ? because on your blog post you have "They also have a surprisingly large 151, 642 token vocabulary (for reference, Llama 2 uses a 32k vocabulary, and Llama 3.1 uses a 128k token vocabulary); as a rule of thumb, increasing the vocab size by 2x reduces the number of input tokens by 2x so the LLM can fit more tokens into the same input. Also it especially helps with multilingual data and coding to cover words outside the standard English vocabulary."

Xnaarkhoo
Автор

When is your whole book coming out ? Eagerly waiting 😅

sahilsharma
Автор

Great Video. Now that LLM is so powerful, will regular machine learning & deep learning slowly vanish?

KumR
Автор

Ich nehme stark an, dass Du Deutsch sprichst :). Wo kann man Dein Buch im Kindle (mobi oder f2b) Format finden? Danke & LG.

andreyc.