DETR: End-to-End Object Detection with Transformers (Paper Explained)

preview_player
Показать описание
Object detection in images is a notoriously hard task! Objects can be of a wide variety of classes, can be numerous or absent, they can occlude each other or be out of frame. All of this makes it even more surprising that the architecture in this paper is so simple. Thanks to a clever loss function, a single Transformer stacked on a CNN is enough to handle the entire task!

OUTLINE:
0:00 - Intro & High-Level Overview
0:50 - Problem Formulation
2:30 - Architecture Overview
6:20 - Bipartite Match Loss Function
15:55 - Architecture in Detail
25:00 - Object Queries
31:00 - Transformer Properties
35:40 - Results

ERRATA:
When I introduce bounding boxes, I say they consist of x and y, but you also need the width and height.

Abstract:
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at this https URL.

Authors: Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko

Links:
Рекомендации по теме
Комментарии
Автор

This is a gift. The clarity of the explanation, the speed at which it comes out. Thank you for all of your work.

slackstation
Автор

Yup. Subscribed with notifications. I love that you enjoy the content of the papers. It really shows! Thank you for these videos.

aashishghosh
Автор

Really appreciate the efforts you are putting into this. You paper explanations make my day everyday!

rishabpal
Автор

Greatest find on YouTube for me todate!! Thank you for the great videos!

sahandsesoot
Автор

I had seen your Attention is all you need video and now watching this, I am astounded by the clarity you give in your videos. Subscribed!

ankitbhardwaj
Автор

The attention visualization are practically instance segmentations, very impressive results and great job untangling it all

Phobos
Автор

A great paper and a great review of the paper! As always nice work!

michaelcarlon
Автор

WoW, the way you've explained and break down this paper is spectacular,
Thx mate

chaouidhuzgen
Автор

Great!!! absolutely great! fast, to the point, and extremely clear. Thanks!!

opiido
Автор

This video was absolutely amazing. You explaned this concept really well and I loved the bit at 33:00 about flattening the image twice and using the rows and columns to create an attention matrix where every pixel can releate to every other pixel. Also loved the bit at the beginning when you explaned the loss in detail. alot of other videos just gloss over that part. Have liked and subscribed

hackercop
Автор

Awesome video. Highly recommend reading the paper first and then watching this to solidfy understanding. This definitely helped me understand DETR model more.

adisingh
Автор

Thank you for your wonderful video. When I read this paper first, I couldn't understand what is the input of decoder (object queries), but after watching your video, finally I got it, random vector !

zeljnrp
Автор

Thank you for this content! I have recommended this channel to my colleagues.

renehaas
Автор

Thanks so much for making it so easy to understand these papers.

AishaUroojKhan
Автор

Fantastic explanation 👌 looking forward for more videos ❤️

pranabsarkar
Автор

Was waiting for this. Thanks a lot! Also dude, how many papers do you read everyday?!!!

ramandutt
Автор

"Maximal benefit of the doubt" - love it!

edwarddixon
Автор

Very well done and understandable. Thank you!

Gotrek
Автор

34:08 GOAT explanation about the bbox in atttention feature map.

oldcoolbroqiuqiu
Автор

You are a godsend! Please keep up the good work!

biswadeepchakraborty