All Convolution Animations Are Wrong (Neural Networks)

preview_player
Показать описание

All the neural network 2d convolution animations you've seen are wrong.

Рекомендации по теме
Комментарии
Автор

Premise 1: All convolution animations are wrong
Premise 2: This is a convolution animation

Conclusion: this is wrong

rezhaadriantanuharja
Автор

you should've started with the typical RGB 3 layer input image, and animate convolutions on that; that's where most people start to get lost as to how the weights match with inputs, translating from the 2D mental model to 3D.

randyekrer
Автор

The example just a concept. I don't agree with this sensational title.

kuanarxiv
Автор

A major thing that feels missing to me in the animations is clear textual labeling. It's fine that you label them out loud, and then, also, it would be more accessible for folks with hearing challenges or cognitive challenges. My crit aside, this animation is lovely, and I'm very impressed with what you've done. You've earned yourself a new subscriber :)

avidrucker
Автор

They are not wrong. They are just displaying a different case than what you are interested in. Maybe they are misplaced in the material you were looking at, but if they were animations for different things, like convolution filters in image processing, they wouldn't be wrong. Have some humility.

allNicksAlreadyTaken
Автор

All these wrong illustration and animation trends have been among the many problems where you would think "why the hell have we been doing this all wrong, all the time, everywhere?". Finally, someone came and did the obvious. Thank you!

thomasprimidis
Автор

The animation is just meant as an abstraction of the spatial convolution operation itself. A spatial CNN layer consists of spatial convolution operations across multiple input and output channels (which is what you are referring to)

peabrane
Автор

Forget the animation itself (even though its great). I just appreciate a non-moving camera. It bothers me so much when people spin the camera around a nice animation in a circle. Makes me feel like I am on a carnival ride.

logon
Автор

Oh man, I'm so glad someone took a direct approach to this problem, when I was learning I was so confused by all these animations and explanations in 2D, and then seeing resulting tensor shapes got me super confused, where the depth go and where did it appear? Thanks for bringing this video to the world!

spider
Автор

The first animation you say is wring shows the contribution of one filter operations which is quite accurate. is you considered the number of input channels one and out put channels 1 that is the right figure for the whole operation. the conv2d operation are all element-wise matrices multiplication with shifting windows. the 3D animation you did look great but lack of that notion . that is my option. i stick with the 2D.

bediosoro
Автор

Instead of spending 95% of the video ranting about how other animations are bad, I would have appreciated it more if you had spend that time explaining how this animation works. I don't think I learned anything from this video.. How do you go from an input RGB image of size W * H * 3, to some cube of size 5 * 5 * 5 (+padding)? You lost me at step 1..

tomo
Автор

well, not speak for the existing animations/figs, i won't say they are wrong, they have some issues, but essentially they are correct. When talking about 2D convonlution, we should know the input and output are 3D as input is a picture and output is also a picture/feature map.

pew_pew_pew
Автор

So in case of a feature map input, 2d conv just replicate each 2d filter along the feature dimension and do multiplication wise? In the video, the filters are 2d really just replicate to fill in the the number of features? or does each 2d filter is in reality a 3d tensor to match the feature dimension?

grjesus
Автор

Unfortunately, only half right. How about if we need to understand 4D or 5D convolution situation? Humans can understand 2D most intuitively and I think this is a reason for why made that 2d based animations. (And 2d convolution can extending to a larger dimension.)

And deep learning convolution is unfortunately not mathematically organized. It is derived from "filter" in image processing. and "filter" also derived from "cross correlation" long before.

You are animation have a multiple kernels, It just depict an argument called "channels" that is only used by "Neural Network" frameworks.

devjeonghwan
Автор

Thanks for that. It was really confusing before your animation came up!

felipelourenco
Автор

Not wrong bro. They are just incomplete.

shuninc
Автор

They are not wrong. They are a simplification that helps to understand the concept. As any simplification they are incomplete. But not wrong. It's sad that you use clickbait titles.

alexeychernyavskiy
Автор

"a 2D convolution actually takes in a 3D tensor as input and has a 3D convolution as output", well, it depends right? If you have a single channel/grayscale image then the input is in fact a 2D tensor, and each feature outputs a 2D tensor that is joined with all others in the feature map. So if you have a grayscale image with a single feature, the animations would in fact be correct.

I think the animations are perfectly fine, as they simplify a concept to it's most basic form for easy understanding. But it is true that after you understand the basic concept, a 3D - 3D representation is also nice to understand more common and complex examples.

Disclaimer that I could be wrong as I am by no means an expert, but this is my take from my current understanding of convolutions :)

pere_gin
Автор

amazing. you have cleared all my doubts in single shot

kartikpodugu
Автор

The use of all these misleading animations is the primary cause of misconception about convolutional neural networks; you have finally provided a good visualization. I am happy to share this content with my colleagues.

PeppeMarino