DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

preview_player
Показать описание
This video talks about DINO - the first state-of-the-art, Detr-like, transformer based model.
The model itself builds on top of the concepts introduced in Detr, Deformable Detr, DAB Detr and DN Detr, improving on them and remixing them to achieve superior quality under the same conditions (training time, parameter count, pretrain data size). One of the model variants also utilises huge backbone - Swin-L - and pretraining on Objects365 dataset to achieve SOTA accuracy on CoCo dataset.
Important links:

00:00 - Intro
02:30 - Previous Detr models overview
20:54 - Contrastive Denoising Loss
24:32 - Mixed Query Selection
26:59 - Look Forward Twice
30:20 - Objects 365 Dataset
32:54 - Results
37:22 - Next Up
Рекомендации по теме
Комментарии
Автор

here goes the giant creature! Thanks for the video, high quality as always

anhduy
Автор

There is some very confusing issue with the model name:
* Facebook have another model called Dino, which is a self supervised vit model
* The lineage of detr have a Semantic-SAM model, again with the same name as Facebook segmentation model
* And to make it more confusing, the original detr was developed by Facebook

From what I see, All these models are very capable and interesting

eliaweiss
Автор

Love the video and series in general, would love to see something similar with other topics such as NLP or maybe super-resolution? Is anything like that planned. Also keep up the great work 🔥

matejsirovatka
Автор

Hey Mak! Thanks for such a great video. Since I'm working with DINO or something similar for my thesis project, was wondering if I could work with the model only including the denoising queries and deformable attention but excluding dynamic anchor boxes since it may not lead to significant improvements in performance for my use case?

subramanyabhat
Автор

1 week of your time = -3 weeks of research time * number of subscribers

davidro