CS231n Winter 2016: Lecture 8: Localization and Detection

Показать описание

Stanford Winter Quarter 2016 class: CS231n: Convolutional Neural Networks for Visual Recognition. Lecture 8.

Get in touch on Twitter @cs231n, or on Reddit /r/cs231n.

Рекомендации по теме

Комментарии

Thank you so much for creating this class and posting these videos, Andrej.
Your work has been very inspiring to me, and has helped me tremendously in shifting my own career.
Keep up the good work.

AlanMelling

*My takeaways:*
1. Classification and localization 3:19: Overfeat
2. Object detection 24:10: R-CNN, Fast R-CNN, Faster R-CNN and YOLO
- Mean average precision (mAP) 38:45

leixun

This was more of a paper presentation than a lecture. This is also evident from very few questions were asked from the students during the lecture. Details like filter sizes, depth per block in the pipeline and ROI pooling for fast-rcnn and faster r-cnn were not clear to me. Hope a better version from 2017 class will be uploaded.

rohitsaxena

Thanks for posting this Andrej! Really helpful for learning about or reviewing these topics.
One small tip for Justin Jonhson: it would be nice to repeat the audience questions (hard to understand them in the recording otherwise).

AhmedKachkach

The explanation of the OverFeat sliding window efficiency at 16:10 is pretty poor. The paper is much clearer. The point isn't really "reimagining" the FC layer as a convolution step. Instead, it lets you take advantage of efficiencies built into Convolution Operation implementations that aren't present in FC implementations.

Imagine a convolution operation in 1 dimension. Let's say you're kernel is 5 numbers. In step 0, I add A+B+C+D+E = A + (B + C + D + E). That cost me 4 add ops. In step 1, I want to add B+C+D+E+F. I can use my cached value and calculate it with (cached_value) + F. Which only cost me 1 add op. Efficiencies like this can be scaled to implementations of the convolution operator. However, FC layers operate over the whole input and have no logical place for such caching.

In Overfeat, we're running these operations on "windows" of the input image. Each window is a lot like a patch of input to a convolutional layer. By transforming the last FC layers into convolution operations, we can treat the whole network as a series of convolution operations and then take advantage of the inherent efficiencies (described above) of convolution operations.

robcrane

51:26
The answer i have been looking for. Also there are some other question i did like to ask :
1. I still can't really imagine what this 3x3 kernel in RPN trying to represent. In vanilla CNN i can say this filter is responsible for detecting this particular feature on the image (color, pattern, line, edge and so on). But for the 3x3 sliding window/kernel in RPN, i can't seem to get what its trying to catch/do.
2. Why the depth of the conv layer in RPN has(?) to be the same with the depth of the feature map. In the paper they use 256-d which means 256 channel produced by 256 different 3x3 sliding window/kernel. Is it because the feature map itsellf has 256 channel(depth), assuming the base CNN is ZFNet, and they're just trying to maintain the w, h, d of the feature map after doing the convolution (as in question 1) ?
3. Following after the question 1, what does the 1x1 kernel for cls layer and regs layer doing exactly ?
4. How does each of the anchor box representated in the RPN ? How does doing 3x3 convolution related to "generating" anchor boxes ?

*questions from above are copied from the same lecture video uploaded by other channel

tiasm

Thankyou Andrej, This is helping me so much, with my final year project too. !!!

irtazaa

At 17:28, I'm still not quite seeing why the feature map to the first FC is a 5x5 convolution (and additionally why it is a 1x1 convolution to the next layer for both the regression and classification heads). Anyone have any pointers they could help me with, or links to additional resources? Thanks!

havenwang

i don't understand the way how to train my model (in what way do i send the images for training )for localising and how do I really get the bounding boxes. Can you explain these things in a clear way?

kamalisrinivasan

So basically we know the answers but not the questions asked by students
Let's make the questions!!!

ShahidulIslam-xfoz

Very good lecture. Is it possible to repeat questions asked by students (or put sub-title for those)?

rajeev

When you regress box deltas and positions do you do it in a 0-1 numerical regression or do you regress a one-hot vector ?

Chrnalis

This lecture got me confused a hell lot. Not clear at all. Nowhere close to Andrej's level of teaching.

rishabhrao

while converting a fully connected layer to a convolution layer what we are doing(in this case 17:16 )is using 4096 5*5*1024 filters.but if you count no of parameters in this layer it is 5*5*1024*4096 which is much greater than no of input features which is 5*5*1024.if you use one parameter in association with one feature from feature space it would only take 5*5*1024 parameters so how is it justified using 4096 5*5*1024 filters?

arjunkrishna

This lecture feels like the lecturer is only reading the slide aloud. Explanations are unclear, the lecturer hastily skipped through many parts, without providing any clear explanation.

ashrafibrahim

CS231n Winter 2016: Lecture 8: Localization and Detection

CS231n Winter 2016: Lecture 8: Localization and Detection

CS231n Winter 2016 Lecture 8 Localization and Detection-wFG_JMQ6_Sk.mp4

CS231n Winter 2016: Lecture 10: Recurrent Neural Networks, Image Captioning, LSTM

CS231n Winter 2016: Lecture 7: Convolutional Neural Networks

CS231n Winter 2016: Lecture 4: Backpropagation, Neural Networks 1

CS231n Winter 2016: Lecture 9: Visualization, Deep Dream, Neural Style, Adversarial Examples

CS231n Winter 2016: Lecture 11: ConvNets in practice

CS231n Winter 2016: Lecture 5: Neural Networks Part 2

CS231n Winter 2016: Lecture1: Introduction and Historical Context

CS231n Winter 2016 Lecture 7 Convolutional Neural Networks LxfUGhug iQ-sHyIqu_S5Ks.mp4

CS231n Winter 2016 Lecture 9 Visualization, Deep Dream, Neural Style, Adversarial Examples

CS231n Winter 2016: Lecture 6: Neural Networks Part 3 / Intro to ConvNets

CS231n Winter 2016: Lecture 3: Linear Classification 2, Optimization

CS231n Winter 2016: Lecture 12: Deep Learning libraries

CS231n Winter 2016: Lecture 2: Data-driven approach, kNN, Linear Classification 1

CS231n Winter 2016 Lecture 9 Visualization, Deep Dream, Neural Style, Adversarial Examples.mp4

CS231n Winter 2016 Lecture 4 Backpropagation, Neural Networks 1-Q_UWHTY_TEQ.mp4

CS231n Winter 2016: Lecture 15: Invited Talk by Jeff Dean

CS231n Winter 2016 Lecture 5 Neural Networks Part 2-jhUZ800C650.mp4

CS231n Winter 2016 Lecture 12 Deep Learning libraries-b6RntuTiKQo.mp4

CS231n Winter 2016 Lecture 1 Introduction and Historical Context-F-g0-6_RRUA.mp4

CS231n Winter 2016: Lecture 14: Videos and Unsupervised Learning

CS231n Winter 2016 Lecture 2 Data driven approach, kNN, Linear Classification 1-ZM4umP6F1Jc.mp4

CS231n Winter 2016 Lecture 2 Data driven approach, kNN, Linear Classification 1-ZM4umP6F1Jc.mp4