Episode 3: From PyTorch to PyTorch Lightning

Показать описание

This video covers the magic of PyTorch Lightning! We convert the pure PyTorch classification model we created in the previous episode to PyTorch Lightning, which makes all the latest AI best practices trivial. We go over training on single and multi GPUs, logging and saving models, and many more!

Willam Falcon is an AI Ph.D. researcher at NYU, and creator and founder of PyTorch Lightning

Chapters:
00:00 Introduction to PyTorch Lightning
00:38 Install PyTorch Lightning
01:03 5 main components of a Lightning Module
01:47 Defining a model
04:05 Optimizer
05:20 The Training Loop
07:26 Loading and preparing data
09:10 Running training experiments
16:04 Training on a GPU
17:35 Logging and saving models
23:02 Validation loop
31:48 Multi GPU training

Lightning AI

Рекомендации по теме

Комментарии

Great tutorial!

One thing I noticed for anyone working through this in 2022, the accuracy wont show up on the progress bar using the method in the tutorial.

To get it to work, you need to remove the progress bar pbar variable from the return statement and instead insert "self.log("accuracy", acc, prog_bar=True)" into the training_step function

paulmathew

My vote for the order of feature coolness: #1- trivial multi GPU, #2- flexible tensorboard (I'm logging a bunch of metrics), #3- accumulate_grad_batches, #4- resume_from_checkpoint, #5 Hparms logging in tensorboard (especially useful when I keep tweaking parameters in the middle of a day long run, then resume), #6- warmup learning rate with optimizer_step.

johngrabner

@20:15 is my favorite part of the video. Alfredo is so freaking honest at that moment, love it.

adityassrana

this was very helpful to reorganise my pytorch-lightning 0.7 or 0.8 code into the latest version. thanks guys waiting for more.

timothydell

Awesome, it's so easy to implement distributed training across nodes along with custom hooks!! 😉

mayankbhaskar

When will you publish the next video? This is amazing

asiffaisal

Both of these guys have something that confuses them with their back wall, amazing

michelangelo

Thanks for this video, If you can cover call backs, that will be interesting learning. The progress bar always overwrites the previous metrics. I am hoping if you can cover printing metrics for each epoch separately, it will be of great help.

KSK

the return dict of the training_step [21:00], unfortunately the docs don't provide a lot of info about this point

osamansr

Great job, you both!

Your 'setup' method got a typo, it should be:
"train_data = datasets.MNIST('da ..."
istead of:
"datasets = datasets.MNIST('da ..."

But it gives me an error:
'TypeError: setup() takes 1 positional argument but 2 were given'

ramisketcher

Wonderful video and I will start using it, will the next episode do a VAE, cycle GAN, and hook at least? 😃

Feel free to ignore this part if you think it is too much.
I hope we will do world model, pixel-level classifiers, cycle-GAN, Transformer, LSTM, Heatmap, Hook, Upsampling, GPT, Bert, music generation, and more because these are the basic today.
Colab doesn't run the world model(truck backing one) in anime. I am not sure we have something that makes Colab runs it.
We should do more on self-supervised learning and Energy-Based model.

jonathansum

Hey guys this is a great video. and I am really looking forward to simplify my pytorch pipeline with some of this code. There are just two issues I am running into:
1. When using acc = accuracy(logits, y), lightning complains about non-normalized predictions. What would you propose for this specific task, a lot of people just use a softmax layer in the end and add a log-likelihood loss.
2. When I define my train and val dataset split in my train_dataloader function by assigning self.train and self.val, and then just use a DataLoader on self.val in my val_dataLoader, I receive an error saying that my object has no attribute val, so I assume the call order is diffrent?

Great introduciton apart from these minor things though, keep up the good work
Cheers Nico

nicolasmandel

Accuracy as demonstrated in the video is deprecated as of now. I think now you have to use `torchmetrics` and `self.log(prog_bar=True)` to obtain the effect demonstrated in the vid. Correct me if I'm wrong?

MateuszModrzejewski

I struggled a bit with this to get it working on my current setup. Looking through the API, I figured out that setup() also requires you to pass in stage. Might be good to add an overlay or something to the video pointing that out? Really looking forward to trying this out on a multi-gpu setup once I get my cooling situation under control.

xOoOverflw

got some error

MisconfigurationException: No `train_dataloader()` method defined. Lightning `Trainer` expects as minimum a `training_step()`, `train_dataloader()` and `configure_optimizers()` to be defined. Any idea why this error.

rameshprakash

Hi William, Afredo, thank you for this introductory tutorial! I just wanted to point out something. I followed along with mine version of the code and I noticed that calling the training portion of the data "train" may cause some issues (you instantiate self.train and self.val in the setup hook): the LightningModule invokes self.train() at a certain point which became instead a Subset in your example :)

ga

I would like to see more explanations on why certain functions inside the model are chosen and the implications of numbers chosen for the functions. Ie why use linear vs conv2d. Also I don't quite understand the second linear transformation which goes from 64 to 64. In most tutorials usually the output is greater than the input? Thanks for making these videos. I'm new to machine learning and trying to apply these concepts to unstructured binary data using pytorch.

xOoOverflw

I wish you guys finished that "train_loss/val_loss" array setup for plotting later. Love the videos!

ulugbekdjuraev

Nice video! Just in case I misunderstood, when using multi-GPU, do I still need to specify the number of GPUs and nodes in the code after specifying in the SLURM script? Which specification will pl choose when the two are different?

anniezhi

You should do more advanced tutorials to really show off the features

rahuldeora

Episode 3: From PyTorch to PyTorch Lightning

Episode 3: From PyTorch to PyTorch Lightning

What is backpropagation really doing? | Chapter 3, Deep learning

PyTorch Lightning Community Talks - Episode 3

PyTorch vs TensorFlow | Ishan Misra and Lex Fridman

G AI - Episode - 3 - Session - 1 - What is pytorch

G AI - Episode - 3 - Lesson - 2 - Demystifying torch.nn - The Backbone of Pytorch

Converting from pytorch to pytorch lightning in 4 minutes

Mr. Robot Sucks

The age old question: TensorFlow or PyTorch?

G - AI - Episode - 3 - Lesson - 7 - Lets Install Python and integrate Pytorch

11 years later ❤️ @shrads

What Is ChatGPT Doing? Episode 3: Neural Networks for Human-Like Tasks

Building a Neural Network with PyTorch in 15 Minutes | Coding Challenge

G AI - Episode - 3 - Introduction to developing a basic generative adversarial network

PyTorch Lightning - Speed up model training with benchmark

Getting Started with 3LC | Part 1: Creating a Table

Episode 2: PyTorch Dropout, Batch size and interactive debugging

Part 3: Multi-GPU training with DDP (code walkthrough)

Jeremy Howard: Deep Learning Frameworks - TensorFlow, PyTorch, fast.ai | AI Podcast Clips

Boltus: God of AI | Episode 3

Cats&Dogs PT 3: Pytorch Custom Dataset

PyTorch Lightning - Auto select GPUs

PyTorch on GCP

Pytorch - Lesson 3 - #AIScience