Tips Tricks 16 - How much memory to train a DL model on large images

preview_player
Показать описание
Rough calculation to estimate the required memory (esp. GPU) to train a deep learning model.

Code generated in the video can be downloaded from here:
Комментарии
Автор

Great explanation. I used to manage the image and batch size on an ad-hoc basis resulting in OOM errors. This tool now gives me a better way to anticipate memory requirements.

KarthikArumugham
Автор

Great explanation!! Incredible to see how big a single image can get once it's gone through the whole network.

jacobusstrydom
Автор

This is correct if you want to keep all output data in memory. In many cases, the output data is temporary (used once) and no need to keep it. In such case, you can create a buffer in to swap data in & out which save significant memory footprint....

AsaadFSaid
Автор

this is an important video, when you decide to buy a GPU, you will usually have the challenge to select which of them.
Thanks Dear Sreeni

behnoudshafizadeh
Автор

Great video as always! I learn more from you, than from most of my teachers at the graduate school 😅

laohu
Автор

Thanks Mr Srini for this video. Very useful

ganapathyshankar
Автор

Such useful information in a relatively short video! Amazing!

diogosilva
Автор

Best Channel for Information on DL and AI.
Thank you

abubakrshafique
Автор

Thank you very much for this video. I have a question and maybe you have the time to answer it:
On the slide at 8:38, you said that the backward pass consumes a similar amount of memory as the memory requirements of the feature maps from the layers. Consequently, shouldn't there be a factor of two applied to "features_mem" in your function that approximates the total memory needed for the neural net?
The required memory of the parameter gradients is also neglected, isn't it? Independent of the optimizer a factor of two applied to the "parameter_mem_MB" should be reasonable since the gradient must always be calculated. Please correct me if I am wrong! In addition to that the memory requirement for the momentum that is used by some optimizers would require even a factor of three applied to the "parameter_mem_MB" as the information about previous iterations is needed for the momentum calculation.
Most of this is mentioned on the slide at 8:38 but not considered in the "get_model_memory_usage" function. Maybe you can give some feedback on this. That would be really nice.

schwatzgelber
Автор

Absolutely amazing video 👌🥰 is it possible to do a similar example for a GAN model? Such an example is the stack gan?

aaronabelachelseafc
Автор

I recently learn a lot from your youtube tutorial sir! Anyway im working on some rnd regarding human segmentation (black and white) and make your tutorial as my guide.

Just want to ask, what would make a good segmentation dataset ya sir? Is it the quality? Variety of human body?

haqkiemdaim
Автор

you can use mixed presision with float16 this will bring the size down by half.

dimitrisspiridonidis
Автор

Thanks for this video. Super helpful. Can you please benchmark 3080ti vs 3090 for deep learning

sumitbali
Автор

Thank you very much for your knowledge sharing sir...
Sir can I use nvidia Jetson developer kit for training deep learning python codes ?

ashwinig
Автор

Hi, nice video, however have you ever considered or tried to adapt Gradient accumulation ? Even if your batch does not fit to your GPUs memory in Tensorflow there are technics to cut your big batch into mini batches. Maybe it would be worth to try it and make the video about it?

kamilchodzynski
Автор

Thanks, Sreeni, I see a lot of machine learning learners recommend going to Google Colab Pro, what do you think about this?

abdulla
Автор

Very informative video. I have a query. I am trying to train standard UNet model on 512x512x3 dataset with batch_size=4 on RTX 3090 with 24GB GPU. I am getting memory full error. Please help me to resolve this issue.

vimalshrivastava
Автор

Thank you! But I feel the real situation is more complex than it. I do GAN research in my PhD program. I use the same model, same codes, same image dataset and same batch size on Colab Tesla V100 (16GB VRAM) and RTX A6000 (48 GB). On Colab Tesla V100, it takes about 16 GB VRAM. But on RTX A6000, it takes more than 30 GB. So I think the number of CUDA cores also affect the memory since more CUDA cores will do more parallel computing. RTX A6000 has about 10, 000 CUDA cores and Tesla V100 has about 5, 000 CUDA cores.

yifeipei
Автор

Respected sir for my cardiac mri segmentation work, i used unet architecture, for that training i have taken in the ratio (80, 100).The total images are nearly 900 but in the output only 224 images are displayed . Can u please explain me?

gomathig
Автор

Very much interesting stuff, thanks a lot, but as usual the code isn't available within your amazing github stuff.

falahfakhri