Do Neural Networks Need To Think Like Humans?

preview_player
Показать описание

📝 The paper "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness " is available here:

Neural network visualization footage source:

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Claudio Fernandes, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Lorin Atzberger, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Richard Reis, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Thomas Krcmar, Torsten Reil, Zach Boldyga, Zach Doty.

Károly Zsolnai-Fehér's links:
Рекомендации по теме
Комментарии
Автор

"hold on to your papers" always makes me chuckle :) This is a very interesting report.

arodic
Автор

Neural Networks thinking in terms of textures makes sense if you think about it.

Humans see in 3D, which makes it very easy to find the shape of an object since you can easily distinguish it using proximity as extra information.

On the other hand NN are almost always trained using 2D images in which the object information is primarly contained within the "fill" pixels, which are way more abundant than the edge ones.

I think it's really interesting how the network adapted when presented with incentives (more information stored in unique shapes than in unique textures) and I really wonder what the result would be if we trained a NN with stereoscopic sets of images instead of single, 2D ones.

Thinnestmeteor
Автор

Love all the interesting research you can do just with existing ideas combined together in new ways. The fact that the stylised data improved overall accuracy is just mind blowing to me, it really shows how far ahead the brain is and how much more inspiration we can still draw from it

MobyMotion
Автор

Who is that one guy that failed to identify a cat? xD

Laezar
Автор

Very cool. They are basically forcing the neural network to effectively learn to recognize shapes. Older networks focused on textures mainly because textures are much more low-cost to learn; meaning that normally overfitting trough texture recognition would happen before the network would effectively learn to distinguish shapes.

The next logical step in image and video classification will be learning to understand objects and shapes in 3D space. This will open a wealth of new opportunities and it will be the next big thing in AI.

thomasjarvis
Автор

Whoever mistook that cat is a disgrace to mankind and should be immediately replaced with artificial intelligence..

digitalsoultech
Автор

Human children learn through experience that shapes are more significant than textures. If the neural networks were embedded in bodies that had to learn how to act in the world, they would also see shape as more important and wouldn't need to see a cat with the texture of other things in order to care more about the cat's shape. Shape is more important when there is poor lighting, or when you have to physically interact with the object. For an AI to really think like a human, you have to give it the same experiences as a human. Either that, or embed every abstraction into training data.

Asking what property of objects correctly identifies them is the same as asking which abstractions are useful, which is the same as asking how you want to interact with the world. Generating training data that already tells AI what abstractions matter is a way of getting around the task of learning what actually matters. Implicitly, the goals and values of the data scientist are transferred to the neural network. That introduces possibilities of bias, but from a research perspective, I think it could be viewed more as cheating, or skipping a step. From an engineering perspective, it doesn't really matter, because you still get the neural network that knows things.

I think the best AI's will be trained in simulations of our world, and will end up behaving surprisingly similarly to humans.

mfpears
Автор

Amazing paper. Also you explained it really well! Well done thanks for the video

edeneden
Автор

Will this technique harden Neural Networks against adversarial attacks?


Seems like if they were looking at textures this might be a reason that changing a few pixels can change the classification.

TheReferrer
Автор

This got me thinking, the cat with elephant texture is not something you would see naturally in the real world, or very unlikely at least. So if you were to see the shape of a cat with elephant texture, it is more likely an elephant than a cat. Perhaps the image is zoomed in? Should it really be recognized as a cat? are we correct in assuming it is a cat, or does the shape just look like a cat on the elephants skin? Humans are good at recognizing patterns where there are none, maybe the AI made a better guess?

atakminer
Автор

Ordering myself an elephant texture mask for the AI Apocalypse.

chetank
Автор

Actually, it looks like a cat tatoo on an Indian Elephant.

Enigma
Автор

This is one of the best videos, simple and deep at the same time :)

Question: I was reading about NEAT (NeuroEvolution of Augmenting Topologies) and I assumed you would have talked about the subject, but I did not find any mention in your videos, can you tell me, please, if you have any video you can suggest me about it

Leibniz_
Автор

This is very important for self driving cars. If we can cover the recognition in all ways that human can percive, then there is no danger a human could recognize but Neural Network does not.
Not only it results in less car crashes, but it greatly improves the trust humans put onto these recognition algorithms, knowing that their live is in the neural net “hands”.
Wonder why? Because then human cannot say “I would do better, I would have seen it, if I were driving, this car crash wouldn’t have happened.”

Tondadrd
Автор

Not surprising at all. Humans reconstruct the 3d World they see. They identify Materials(even reflective ones), Light sources, geometry, objects based on their context and 3d appearence. They should work more on representing 3d shapes in a neural networks instead of throwing millions of pictures to remember on the net. I didnt have to see millions of cats to learn to identify one. I noticed the differences to similar animals and stored that. It wasnt training. One key point for a neural net would be to learn to predict different appearences of objects from different perspectives and and also possible deformations.. Google tried a similar thing. they trained a neural renderer that could reconstruct simple 3d game levels observed before using just the 2d Image. Thats the better direction to go for object recognition. I knew this already 8 years ago. Not that it depends on textures as much but that combining lots and lots of 2d feature detectors in a kind of fuzzy logic network for direct object recognition is very limited and not natural or very clever or efficient. Think about it, if u have only seen brown cats in your life and suddenly there is a blue cat you would recognize it and categorize it as a blue cat while the neural netwirk wouldnt even see a cat and nowhere be able to say its a funny blue cat. Why? Because it is only able to match vs the training set and output a "how similar is it to..x?" value. It needs to find and understand the parameters that "constructs" or "renders" the blue cat to actually understand it. Thats a way harder problem but much more natural and efficient take on the problem.. Oh how to do that? Easy. Use a normal 3d renderer, procedurally generate objects, use an Autoencoder with perspective and lighting as additional input and reconstruct (learn to generate) a different perspective with that auto encoder. After that is done for a while throw pictures at it and have it learn further how to reconstruct the images. Use this auto encoder as training set generator and use a new network to learn the reverse from picture to the low dimensional (geometry based) auto encoder input. Use the this reversed output to run and learn a classifier on it. Thanks and your welcome humanity :) I wish I had the time to actually do this myself. EDIT: just had the idea to not only use a static auto encoder but instead use an "auto encoder" that encodes into a (vector) sequence as "bottleneck" and then decode it. This way it should be much much more capable to encode information. When I imagine a scene in my head I usually scan through the objects, to construct the "overview" then I can freely move around in the scene. Interesting observation

ecicce
Автор

I like how other neural networks were used to create the training set.

john_hunter_
Автор

Yeah the high texture affinity of neural networks was something that struck me too, but i had no idea what to do about it. This is almost hilarious in how elegant it is.

SianaGearz
Автор

It is fascinating that this network can overcome the style image's influence by such a large margin for some categories. It's certainly encouraging that it doesn't perform more poorly in any categories than previous networks.

simoncarlile
Автор

Oh! Here's another idea!
The Why Can't We Have Both approach:
Basically, take two arbitrary images, blend them via style transfer, and then teach a single network from such combined images to both classify and reconstruct *both* the texture image *and* the object image.
Obviously the style transfer process is VERY irreversible, but in trying to achieve this, the network ought to get really good at both kinds of thinking.
Like, take that image of the elephant skinned cat. Of course I see that as a cat. But I *also* recognize the texture as that of elephant skin. On the AI side it's basically misunderstanding which exact question it's supposed to answer. But by making it answer *both*, it ought to become really good at either question.

Also I bet this kind of deal is also gonna make AIs better to, say, recognize cartoons or similarly stylized representations as what they are supposed to be.

Kram
Автор

I would argue that this is not a texture problem but a question of analyzing local structures versus global. And style transfer is just another patch on this gap.

katerinamalakhova