Understanding PyTorch Buffers

Показать описание

This video explains what PyTorch buffers are, a concept that is particularly useful when dealing with GPU computations and implement large models like LLMs.

---

---

Рекомендации по теме

Комментарии

00:03 PyTorch buffers are essential for implementing large models
01:39 Instantiating a new causal attention without buffers
03:12 Transferring data to GPU using PyTorch Cuda
04:56 Optimizing memory usage during forward pass
06:36 Explanation of creating mask for efficiency in PyTorch Buffers
08:07 Parameters are automatically transferred to GPU, but torch tensors need to be made parameters to be transferred.
10:05 The mask is made a buffer so it's not learned by the optimizer.
11:50 PyTorch buffers facilitate easy transfer of parameters between GPU and CPU

nithinma

Thanks Sebastian! I got what is buffer for. Great lecture.

baburamchaudhary

I always see this register buffer code in transformer network and never though of the reason would be so simple. Thanks for explaining such ignored concept of pytprch.

ashishgoyal

Vielen Dank für die tollen Videos Sebastian! Freue mich bereits, wenn dein Buch auch im deutschsprachigen Raum verfügbar ist.

Natasha_Databricks

Your video was incredibly clear and engaging! Thank you for the awesome explanation!

sjl-sc

Great Work! I like your LLM notebooks as well!

orrimoch

I recently purchased llm from scratch from Manning. Amazing learning experience till now

CRTagadiya

Actually learned something new. Thanks Sebastian!

andrei_aksionau

Thank you very much for this explanation.

SHAMIKII

Hi, Sebastian
I really respect what you are doing. I like your github repository - there are a lot of helpful tutorials
I'm going to buy your next book - Build a Large Language Model (From Scratch).
i have one question. What minimal gpu do you recommend to have to explore and do all examples from your next book?

raiszakirdzhanov

Another advantage is that the buffer gets saved in the state_dict when saving the model

kevindelnoye

It's indeed a clean way to do things but Can't we do the same-thing by adding them as parameter and setting up .requires_grad = False ?

anishbhanushali

Cheers, great video. I'd suggest being slightly more concise. Either way, great video.

putskan

Understanding PyTorch Buffers

Understanding PyTorch Buffers

How to Save Strings and Other Information in Pytorch Model's Buffer?

parameter, buffer get transferred to GPU in PyTorch

So when should I use nn.Parameter vs register_buffer in Pytorch

Fixing an PyTorch Inductor bug: Views, Buffers, Realize

Understanding Pytorch Module

Advanced PyTorch | Fall 2021

Data Parallelism Using PyTorch DDP | NVAITC Webinar

What is RPC? gRPC Introduction.

PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Should Learn? | Edureka

PyTorch on Apple Silicon | Machine Learning | M1 Max/Ultra vs nVidia

How To Scare C++ Programmer

DQN in Pytorch Stream 2 of N | Optimizations and getting ready for Breakout

What is Retrieval-Augmented Generation (RAG)?

Lightning Talk: Making the Most of Heterogeneous Memory Capacity Using PyTorch - Syed Ahmed, NVIDIA

DINO in PyTorch

Illustrated Guide to Transformers Neural Network: A step by step explanation

Solving Mazes with Reinforcement Learning - Part 5 - Building the Buffer Class

TorchRL: The Reinforcement Learning and Control library for PyTorch

Replay Memory Explained - Experience for Deep Q-Network Training

Pytorch for Beginners #39 | Transformer Model: Understanding BatchNorm with in-depth-details

PyTorch Composability Sync: Nested/Jagged Tensor compilation

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

PyTorch Tutorial 15 - Transfer Learning