Segment Anything Paper Explained: New Foundation Model From Meta AI Is Impressive!

preview_player
Показать описание
Meta AI just released Segment Anything Model (SAM), an important step toward the first foundation model for image segmentation. I read the paper and played with the code for the past few days, and I would like to share some insights about this model. I've aimed to be concise and informative, providing you with a brief but comprehensive overview.

⭐ SUPPORT ⭐ ──────────────────
- Subscribe!

CHAPTERS
____________________
00:00 - Intro
00:22 - Demo
01:20 - Foundation Model
01:53 - Data Engine
03:39 - Promptable Architecture
05:10 - Zero-Shot Evaluation
05:35 - Discussion

PAPERS
____________________

USEFUL LINKS
____________________

TRY SAM LOCALLY
____________________

********************
#ai #meta #computervision #airesearch
Рекомендации по теме
Комментарии
Автор

Thank you for watching! Feel free to ask any questions about SAM, the paper, or how to run it locally.

botsknowbest
Автор

An academic paper in the thumbnail always let me know that the video is likely well researched, nice

yoavsnake
Автор

As someone in technology, I’ll know this channel with gain followers you’re in detail

PA-eofs
Автор

thanks fir the wonderful vid! I am intrested in *labeled* masks, have you seen the work of the hybrid mode of grounded-DINO + SAM? I'm curious to know how can I use a labeled dataset I have (of sea-objects) to learn the model to detect not only a boat/ship but to identify the name of the marine-vessel.

kobic
Автор

Great explanation. What I can't get my head around is how the training data for SAM is generated by a model in itself. Wouldn't you get a transfer of bias (e.g. the bias in the training set generating model is represented in what SAM learns)?

I mean, if that bias is low, it can work, but conceptually that's a fairly odd thing to do in the field, right?

ColorfullHD
Автор

Very good explanation...can this Sam works for medical images?

ashwiniyadav
Автор

Can these models key/roto video hair strands as good as a human compositor?

Take your video as an example. It is more or less acceptable for YT. But it is unacceptable even for a short film. You can see the despilled edges. You probably kept those because you wanted to preserve edge details. If you wanted to get rid of those you would lose detail. To do it both at the same time you need to use more advanced keying techniques pro vfx artists use than just picking a color, playing with balance and blur.

If an AI model isn't as good as that, it can be used in social media and for people to have fun. But if you want to use it in movies to actually make it believeable, to allow more people to make movies more easily, to really take advantage of it, and to save a ton of time and money, that will require some precision.

In films you can't really tell if a scene had green/blue screen even if you zoomed 400x. It will have perfect transition of edges that even if it was shown side by side, you really can't tell it. I would love to see an example where this is achieved with AI.

Now all of this is chroma keying (green/blue screen). Rotoscoping, which doesn't involve single colors to key, relies fully on precision. Vfx artists can also do that perfectly but it is a much harder task. And to do that every 24 frame a second seamlessly without any flickering or changing edges is even harder.

I would love to see an example where this is achieved.

berkertaskiran