Understanding PyTorch Buffers

preview_player
Показать описание

This video explains what PyTorch buffers are, a concept that is particularly useful when dealing with GPU computations and implement large models like LLMs.

---

---

Рекомендации по теме
Комментарии
Автор

00:03 PyTorch buffers are essential for implementing large models
01:39 Instantiating a new causal attention without buffers
03:12 Transferring data to GPU using PyTorch Cuda
04:56 Optimizing memory usage during forward pass
06:36 Explanation of creating mask for efficiency in PyTorch Buffers
08:07 Parameters are automatically transferred to GPU, but torch tensors need to be made parameters to be transferred.
10:05 The mask is made a buffer so it's not learned by the optimizer.
11:50 PyTorch buffers facilitate easy transfer of parameters between GPU and CPU

nithinma
Автор

Thanks Sebastian! I got what is buffer for. Great lecture.

baburamchaudhary
Автор

I always see this register buffer code in transformer network and never though of the reason would be so simple. Thanks for explaining such ignored concept of pytprch.

ashishgoyal
Автор

Vielen Dank für die tollen Videos Sebastian! Freue mich bereits, wenn dein Buch auch im deutschsprachigen Raum verfügbar ist.

Natasha_Databricks
Автор

Your video was incredibly clear and engaging! Thank you for the awesome explanation!

sjl-sc
Автор

Great Work! I like your LLM notebooks as well!

orrimoch
Автор

I recently purchased llm from scratch from Manning. Amazing learning experience till now

CRTagadiya
Автор

Actually learned something new. Thanks Sebastian!

andrei_aksionau
Автор

Thank you very much for this explanation.

SHAMIKII
Автор

Hi, Sebastian
I really respect what you are doing. I like your github repository - there are a lot of helpful tutorials
I'm going to buy your next book - Build a Large Language Model (From Scratch).
i have one question. What minimal gpu do you recommend to have to explore and do all examples from your next book?

raiszakirdzhanov
Автор

Another advantage is that the buffer gets saved in the state_dict when saving the model

kevindelnoye
Автор

It's indeed a clean way to do things but Can't we do the same-thing by adding them as parameter and setting up .requires_grad = False ?

anishbhanushali
Автор

Cheers, great video. I'd suggest being slightly more concise. Either way, great video.

putskan