The Wrong Batch Size Will Ruin Your Model

preview_player
Показать описание
How do different batch sizes influence the training process of neural networks using gradient descent?

📚 My 3 favorite Machine Learning books:

Disclaimer: Some of the links included in this description are affiliate links where I'll earn a small commission if you purchase something. There's no cost to you.
Рекомендации по теме
Комментарии
Автор

If you scale the batch size by the learning rate (i.e. lr=(batch_size/32.)*0.01) then the stochastic gradient descent looks sort of okay here.

ErlendDavidson
Автор

Hi! Maybe you can help me with this one: if I want to test an already pre-trained image classifier, how do I proceed regarding the amount of images used? The set containing test images has 100k images, I guess it wouldn't make any sense to load them all at once, so how do I proceed? Thanks!

Metryk
Автор

Good work, hope your channel gets more attention

johnmoustakas
Автор

What do you think of (artificially) adding noise to the learning rate. I feel like it used to be more popular to do that, but almost never see it these days.

ErlendDavidson
Автор

Hey love this video! Was losing touch of the basics !

Agrover
Автор

Your videos are amazing. Thank you so much for this great knowledge and beautiful videos.

muhammadtalmeez
Автор

i didnt see a helpful video like this one in the entire internet, thank you ♥

zrmgwci
Автор

the question is whether if you use a batch and reach the global minimum is your model functionally equivalent to one that didn't batch? Are the weights identical... no they aren't . if your model is generative you don't have equivalence with batch/non batch.

sarahpeterson
Автор

If you generate a dummy dataset and set a static learning rate, then smaller batch sizes work better? wtf?

axelanderson
Автор

Really like the videos. However, I want to warn against the general statement that a batch size of one is not recommended. It really depends on the problem/data. So don't simply dismiss stochastic gradient descent, try it!

OliverHennhoefer
Автор

I love your presentation style! Very energetic :)

ziquaftynny
Автор

This is the kind of thing that I hate about deep learning. A single parameter in the optimization method can completely change the results. Batches should be small but not too small. How small? That's for heuristics but will change on different data sets.

edmundfreeman
Автор

Amazing!!

Did u know why the batch size os always 32, 64, 128?

Levy
Автор

Good content. Try improving ur way of teaching. Learning should in relaxed tone

akshay