Depth Anything - Generating Depth Maps from a Single Image with Neural Networks

preview_player
Показать описание
This week we cover the "Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data" paper from TikTok, The University of Hong Kong, Zhejiang Lab, and Zhejiang University. In this paper, they create a large dataset of labeled and unlabeled imagery to train a neural network for depth estimation from a single image, without any extra hardware or algorithmic complexity.

--

--

--

Chapters
0:00 Intro to Depth Anything
2:00 Use Cases
3:10 Real World Example
5:12 What is a Depth Map?
7:00 Crash Course in Traditional Techniques
9:42 Enter Depth Anything
16:00 Learning from the Teacher Model
18:35 DINOv2 Model
19:18 Depth Anything Architecture
21:29 Evaluation
25:55 Ablation Studies
28:22 Data, Perturbations, Feature Loss
31:15 Qualitative Results
33:00 Limitations
Рекомендации по теме
Комментарии
Автор

The question in the end of video is that they normalize the prediction and GT before calculating MAE. Just like the paper said, they did the same thing as Midas(subtract median and divide scale).
With normalization, they don’t care the GT is disparity or depth value. Just keep the GT to same format, which is inverse depth. If the GT is depth value, they just do 1/depth before normalization.

陳紀融-lg
Автор

I am doing stereophotography and I did once purchase a custom app to generate depthmaps from stereo pairs. Later I did use some opensource google workspace solution for that as well. Both did not work well enough for my needs but that was 5+ years ago. I'm curious if this approach got an upgrade since AI got boosted and if using the sterepair could result in better depthmaps than lens Depth of Field blur used in smartphones. Maybe using twin cameras in smartphones could result in better portrait photos? I know it won't work for objects far away because the stereobase would be to narrow but for regular portraits using twin cameras few cm away in order to generate stereopair so that the smartphone could combine lens blur and AI generated depthmap from stereopair should improve the final depthmap that is used to blur out the portrait. Any thoughts on that idea?

mwysocz
Автор

Great to see someone working on it.

I have an application for depth anything, could it be possible to talk about it with you on video call or meet? 😃

keshav
Автор

What's the point of a depth map if the values are relative/normalized and you can't get the actual estimated distance to the pixel? I'd love to know if I'm missing something and it is possible to get distances from a relative depth map but I haven't been able to.

entrepreneerit