ConvNeXt: A ConvNet for the 2020s | Paper Explained

preview_player
Показать описание
❤️ Become The AI Epiphany Patreon ❤️

👨‍👩‍👧‍👦 Join our Discord community 👨‍👩‍👧‍👦

In this video I cover the recently published "A ConvNet for the 2020s" paper. They show that ConvNets are still in the game! - by adding new design ideas and training procedures they outperform vision transformers even in big data regimes and without any attention layers.

Convolutional prior continues to stand the test of time in the field of computer vision.

Note: I also partially cover the Swin transformer paper in case you missed out on that one. :)

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

⌚️ Timetable:
00:00 Intro - convergence of transformers and CNNs
05:05 Main diagram explained
07:40 Main diagram corrections
10:10 Swin transformer recap
20:20 Modernizing ResNets
24:10 Diving deeper: stage ratio
27:20 Diving deeper: misc (inverted bottleneck, depthwise conv...)
34:45 Results (classification, object detection, segmentation)
37:35 RIP DanNet
38:40 Summary and outro

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️

If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!

Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#convnext #visiontransformers #computervision
Рекомендации по теме
Комментарии
Автор

Another great video. It is really helpful that you provide context for the design choices. Would love to see more videos where you explain how theory transfers to code.

Sntification
Автор

I like how in your videos, you not only explain the details within the paper but also the more "meta" stuffs that is harder for people to grasp without reading through a lot of papers. Reading and understanding one paper is easy. Develop an intuitive understanding of a whole research subfield and its general directions is the hard part.

chankhavu
Автор

Thank you! The video is excellent. I like that you mix code + paper in explanation and the fact that you provide a context and highlight the most essential parts.

yevhendiachenko
Автор

Thanks for the amazing explanation. Yes mixing up the code and paper boosts the implementation speed many folds. I love your work, you are awesome!

gauravlochab
Автор

Thank you for such an in-depth explanation. Your plan of explaining the history and convergence and then going through the paper and code is great way for learners to understand the concepts deeply. Its very important to select the important portions from the paper for further exposition and to leave-out unnecessary boilerplate stuff. I liked that you didn't say "go and read the paper yourself"!

lalitmrinki
Автор

We need to start working on reasoning - perception is converging we're out of ideas lol

Bad jokes aside - at this point, it seems that CNN priors are quite adequate (in the case of natural images) - a hybrid approach (initial stages CNN-like and later stages transformer-like) seems to be the way to go, but the game is still on.

TheAIEpiphany
Автор

This was a great video. The best I've seen about explaining a research paper. 👏

luna
Автор

Nice Explanation. By the way, could I know which software you are using just showing multiple things in one.

sushantgautam
Автор

Thank you so much for the brilliant explanation

manub.n
Автор

very nice content!
I even didn't notice they use the old ResNet top-1 acc instead of wightman's.
And that's make this model less comparative to the SOTAs.

enip
Автор

Great video as always. What software are using to present and annotate the paper?

marvlousdasta
Автор

the pre-training they did on Imagenet-22k was supervised or unsupervised like the way transformer papers do ?

mritunjaymusale
Автор

Can this be used for video classification?

JapiSandhu
Автор

Hi, First of all, I would like to thank you for your excellent and wonderful videos on artificial intelligence.
I am a PhD student working on fast video captioning and I hope to reach real time captioning
But I am confused by too many articles and too many techniques and algorithms in this field
I need your help in guiding me to choose the right path among the existing methods:
(traditional CNN, Transformer, YOLO, self attention only or make combination or others )
While maintaining a trade-off between speed and accuracy

eng_ajy
Автор

what tool do you use to read research papers on Ubuntu? Thank You!

mahmoodkashmiri
Автор

3, 3, 9, s3. What does the s3 mean?

jonathansum