CS231n Winter 2016: Lecture 6: Neural Networks Part 3 / Intro to ConvNets

Показать описание

Stanford Winter Quarter 2016 class: CS231n: Convolutional Neural Networks for Visual Recognition. Lecture 6.

Get in touch on Twitter @cs231n, or on Reddit /r/cs231n.

Рекомендации по теме

Комментарии

1:04:50 "This might be useful in self driving cars". One year later, head of AI at Tesla.

citiblocsMaster

Did cs231n in 2015. great to see the videos released to public now.
Good Job Stanford!

irtazaa

That project report being mentioned at 27:13 had been accepted to one of the ICLR workshops on the following year and has over 500 citations up until now. Impressive stuff.

caner

For someone curious like me @ 43:34
(Someone's siri mistakenly tried to recognize.. and said this..)
Siri: "I am not sure what you said"

vivekloganathan

*My takeaways:*
1. Parameter updates: optimizers, such as momentum, Nesterov momentum, AdaGrad, RMSProp, Adam 3:53
2. Learning rate 28:20
3. 2nd order optimizers 30:53
4. Evaluation: model ensembles 36:19
5. Regularization: dropout 38:25
6. Gradient checking 56:55
7. Convolutional Neural Networks 57:35

leixun

@15:10, Andrej says that according to recent work, local minimums' are not a problem for large networks. Could anyone point me to these papers? I am interested to read these results.

champnaman

Some guy playing dota vs neural network - over million views.
A genius explaining how to build a neural network - 40k views.

boringmanager

Just found that the slides at Stanford website is updated with the 2017 slides + videos. Is there any way to get the original 2016 slides? The lectures are as classic as those of Andrew Ng.

tthtlc

Summary : start from a simple gradient decent : x += - learning_rate * dx, but when we applied this gradient decent in a big dataset, it will take a long time to calculate derivative for each data point (neural unit). So we will use Stochastic Gradient Descent (SGD) to randomly choose a neural unit in a layer instead of whole units. It will reduce the time to calculate derivative.
SGD 's still slow because it jitter on the way of convergence because of random. Then, we have other method to help converge faster: SGD + Momentum, Adagrad, RMSProp, Adam . Each of them will have learning rate. We should find the ideal learning rate for each of data set. For ex: default learning rate of Adam = 0.001 in Keras.
We can use dropout to prevent overfitting.
To prevent overfitting, we have 3 ways: increase dataset, simplify network (dropout, reduce number of layer), preprocessing data (data augmentation)

ThienPham-hvkx

This is helping me quite a lot, thanks!!!

twentyeightO

Intro to convnets is all history, skip to next lecture for convnets

mostinho

In 9:12, the SGD refers to literally SGD (train with ONLY 1 example randomly) or refers to mini-batch? Because in the web lecture notes, it states that term SGD often enough to be used for "mini-batch" actually (not literally SGD using ONLy 1 example).

ArdianUmam

at 44:06, Is he saying that dropout is applied on different neurons in each epoch? Say, I have x -10-10 - y network (x -input, y- output, 10s - hidden layers). In one epoch (that's forward prop + backprop) dropout is applied to, say, 3rd, 5th, 9th neurons of first hidden layer and 2nd, 5th, 8th neurons of second hidden layer, in the second epoch dropout is applied to 5th, 6th, 10th neurons of first layer and 1st, 7th, 10th neurons of second hidden layer. Does it mean we kind of have as many models as our epoch? Can someone clear this up for me?

sokhibtukhtaev

there is a one issue . We don't suggest normalized data when it is a image. but when we use batch normalization , we normalize data. Is that a problem ?

omeryalcn

It's getting exponentially difficult

qwerty-tfzn

8:50 noisy signal = how about usage of kalman filter??

randywelt

The best AI course I have ever taken. Thank you, Andrej!!!

nguyenthanhdat

What happended on 43:38, what's so funny?

jiexc

33:12 L-BFGS not be confused with LBGT

WahranRai

The big idea of momentum update method is smart! But it is obvious that this updating method is an interim method.

nikolahuang

CS231n Winter 2016: Lecture 6: Neural Networks Part 3 / Intro to ConvNets

CS231n Winter 2016: Lecture 6: Neural Networks Part 3 / Intro to ConvNets

CS231n Winter 2016 Lecture 6 Neural Networks Part 3 Intro to ConvNets hd-egPTd9zZzec

CS231n Winter 2016 Lecture 6 Neural Networks Part 3 Intro to ConvNets hd KFJ5ktUc

CS231n Winter 2016 Lecture 6 Neural Networks Part 3 Intro to ConvNets hd

CS231n Winter 2016: Lecture 7: Convolutional Neural Networks

CS231n Lecture 6 - Neural Networks Part 3 Intro to ConvNets

CS231n Winter 2016: Lecture 5: Neural Networks Part 2

CS231n Winter 2016: Lecture 2: Data-driven approach, kNN, Linear Classification 1

CS231n Winter 2016: Lecture 9: Visualization, Deep Dream, Neural Style, Adversarial Examples

CS231n Winter 2016: Lecture 4: Backpropagation, Neural Networks 1

CS231n Winter 2016: Lecture 15: Invited Talk by Jeff Dean

CS231n Winter 2016: Lecture1: Introduction and Historical Context

CS231n Winter 2016: Lecture 3: Linear Classification 2, Optimization

CS231n Winter 2016: Lecture 8: Localization and Detection

CS231n Winter 2016: Lecture 10: Recurrent Neural Networks, Image Captioning, LSTM

CS231n Winter 2016 Lecture 7 Convolutional Neural Networks LxfUGhug iQ-sHyIqu_S5Ks.mp4

CS231n Winter 2016: Lecture 14: Videos and Unsupervised Learning

CS231n Winter 2016: Lecture 12: Deep Learning libraries

CS231n Winter 2016 Lecture 7 Convolutional Neural Networks LxfUGhug iQ

CS231n Winter 2016: Lecture 13: Segmentation, soft attention, spatial transformers

CS231n Winter 2016 Lecture 2 Data driven approach, kNN, Linear Classification 1-ZM4umP6F1Jc.mp4

CS231n Winter 2016 Lecture 1 Introduction and Historical Context-F-g0-6_RRUA.mp4

CS231n Winter 2016 Lecture 9 Visualization, Deep Dream, Neural Style, Adversarial Examples

CS231n Winter 2016: Lecture 11: ConvNets in practice