Explaining the Segment Anything Model - Network architecture, Dataset, Training

preview_player
Показать описание
Segment Anything 2 :

In this video, I dive deep into the technical details and architecture behind the Segment Anything Model, also known as SAM. SAM is the world's first foundation model on image segmentation and is an amazing tool that can segment any image provided to it at multiple nested levels of granularity at interactive latency.

#deeplearning #computervision #machinelearning

To support the channel and access the Word documents/slides used in this video, consider JOINING the channel on Youtube or Patreon. Members get access to scripts, slides, animations, and illustrations for most of the videos on my channel!

0:00 - Intro
1:29 - Architecture
4:50 - Interactive Training
6:30 - Dataset
7:27 - Model Architecture
12:30 - Outro

Other papers cited:

Songs:
Sunny Days - Anno Domini Beats
Wellington Coffee Shop - Dyalla
No 3 Morning Folk Song - Esther Abrami
Рекомендации по теме
Комментарии
Автор

Here's me from the future posting a detailed analysis of Neural Attention:

avb_fj
Автор

Some more information at 10:25 - In the token to image attention, the query comes from the prompt + output tokens and the key, value comes from the image. In the image to token attention, the query comes from the image embedding and the key, value comes from the prompt + output tokens.

SlashDL
Автор

At 10:03, 4 new tokens are added to the sparse embeddings, 1 representing the IoU score, the rest of the 3 representing the masks. Just a minor correction.

SlashDL
Автор

I am flabbergasted by the quality of this content. Thank you for the effort. I just subscribed to your channel. Keep up the good work brother! We look for more :)

manmj
Автор

Your videos just keep getting better and better! Editing is on point with this one. Also great topic and really valuable to have you break things down like this.

DatuxGames
Автор

this video will have tens of thousands of views in the upcoming days

jorgeabraham
Автор

The best video on the subject. Thank you! I'll keep watching your videos

anacaznok
Автор

So happy I got recommended this video. Great quality content!

gingerderidder
Автор

I really like this explanation. Thanks a lot!

Sciencehub-oqgo
Автор

Good quality video. You got a subscriber.

victorbjorklund
Автор

Excellent video, thank you very much! After watching this, there's no doubt in my mind that transformer-based architectures will take over AI for computer vision

ItalianPizza
Автор

Wonderful, I really like the way how you present complex topics!

willikappler
Автор

hello. Great dense video. Suggestion: you are a bit too fast for me: i have to pause on every slide to read it. Usually i x2 the speed of the video, but you are the only opposite i’ve seen on youtube! Maybe you could describe each slide in more details to let us the time to understand it? Just an idea.

Grenoble
Автор

Amazing video! Could you please explain what exactly are the "output tokens" and how do they get them?

turboxxx
Автор

Nice video explaining the interactive training. I have one question: During each step in the interactive training, the loss is calculated during each step or at the end.
To be more clear:
Step 1: I sample a point at the middle of the ground truth mask
Step 2: Feed the point as a prompt into the model
Step 3: Get the best mask from the model
Step 4: From the best mask, calculate error region and sample another positive OR negative point in the error region
Step 5: Loop from step 2 until reached the maximum iteration
Do I have to calculate loss between step 3 and step 4 then update the model, then move onto step 4, or I calculate loss at the end after step 5?

SofieSimp
Автор

May i iask your a question - one type of prompt is segmentation mask . if we have segmentation mask as a prompt why should we use SAM ? we already have binary segmentation mask

timanb
Автор

Very good, I am using SAM and want to understand better to tune the parameters, thus here I am struggling to understand your video (one of the few that actually try to explain the concepts...). What is pt in the focal loss definition?

miyutube
Автор

Hey man, congrats on the great video, rn i am doing my theisis on SAM was of help. May i ask you which camera did u use?

VictorVelazquezEspitia
Автор

How does SAM guess the IoU for new images when there is no ground truth available?

Alice-yqyy
Автор

- What could be the intuition for having MLP for IOU scores and MSE loss on top?
- from their repository, don't see any interface of text prompt usage. Any examples available?

nitinsurya