Explaining the Segment Anything Model - Network architecture, Dataset, Training

Показать описание

Segment Anything 2 :

In this video, I dive deep into the technical details and architecture behind the Segment Anything Model, also known as SAM. SAM is the world's first foundation model on image segmentation and is an amazing tool that can segment any image provided to it at multiple nested levels of granularity at interactive latency.

#deeplearning #computervision #machinelearning

To support the channel and access the Word documents/slides used in this video, consider JOINING the channel on Youtube or Patreon. Members get access to scripts, slides, animations, and illustrations for most of the videos on my channel!

0:00 - Intro
1:29 - Architecture
4:50 - Interactive Training
6:30 - Dataset
7:27 - Model Architecture
12:30 - Outro

Other papers cited:

Songs:
Sunny Days - Anno Domini Beats
Wellington Coffee Shop - Dyalla
No 3 Morning Folk Song - Esther Abrami

Neural Breakdown with AVB

Рекомендации по теме

Комментарии

Here's me from the future posting a detailed analysis of Neural Attention:

avb_fj

Some more information at 10:25 - In the token to image attention, the query comes from the prompt + output tokens and the key, value comes from the image. In the image to token attention, the query comes from the image embedding and the key, value comes from the prompt + output tokens.

SlashDL

At 10:03, 4 new tokens are added to the sparse embeddings, 1 representing the IoU score, the rest of the 3 representing the masks. Just a minor correction.

SlashDL

I am flabbergasted by the quality of this content. Thank you for the effort. I just subscribed to your channel. Keep up the good work brother! We look for more :)

manmj

Your videos just keep getting better and better! Editing is on point with this one. Also great topic and really valuable to have you break things down like this.

DatuxGames

this video will have tens of thousands of views in the upcoming days

jorgeabraham

The best video on the subject. Thank you! I'll keep watching your videos

anacaznok

So happy I got recommended this video. Great quality content!

gingerderidder

I really like this explanation. Thanks a lot!

Sciencehub-oqgo

Good quality video. You got a subscriber.

victorbjorklund

Excellent video, thank you very much! After watching this, there's no doubt in my mind that transformer-based architectures will take over AI for computer vision

ItalianPizza

Wonderful, I really like the way how you present complex topics!

willikappler

hello. Great dense video. Suggestion: you are a bit too fast for me: i have to pause on every slide to read it. Usually i x2 the speed of the video, but you are the only opposite i’ve seen on youtube! Maybe you could describe each slide in more details to let us the time to understand it? Just an idea.

Grenoble

Amazing video! Could you please explain what exactly are the "output tokens" and how do they get them?

turboxxx

Nice video explaining the interactive training. I have one question: During each step in the interactive training, the loss is calculated during each step or at the end.
To be more clear:
Step 1: I sample a point at the middle of the ground truth mask
Step 2: Feed the point as a prompt into the model
Step 3: Get the best mask from the model
Step 4: From the best mask, calculate error region and sample another positive OR negative point in the error region
Step 5: Loop from step 2 until reached the maximum iteration
Do I have to calculate loss between step 3 and step 4 then update the model, then move onto step 4, or I calculate loss at the end after step 5?

SofieSimp

May i iask your a question - one type of prompt is segmentation mask . if we have segmentation mask as a prompt why should we use SAM ? we already have binary segmentation mask

timanb

Very good, I am using SAM and want to understand better to tune the parameters, thus here I am struggling to understand your video (one of the few that actually try to explain the concepts...). What is pt in the focal loss definition?

miyutube

Hey man, congrats on the great video, rn i am doing my theisis on SAM was of help. May i ask you which camera did u use?

VictorVelazquezEspitia

How does SAM guess the IoU for new images when there is no ground truth available?

Alice-yqyy

- What could be the intuition for having MLP for IOU scores and MSE loss on top?
- from their repository, don't see any interface of text prompt usage. Any examples available?

nitinsurya

Explaining the Segment Anything Model - Network architecture, Dataset, Training

Explaining the Segment Anything Model - Network architecture, Dataset, Training

SAM - Segment Anything Model by Meta AI: Complete Guide | Python Setup & Applications

Segment Anything Model (SAM) Breakdown | Computer Vision Breakthrough

Segment Anything - Model explanation with code

Segment Anything Model (SAM) from Meta AI: model architecture, data engine, results and limitations

Segment Anything Model (SAM): a new AI model from Meta AI

Segment Anything! Meta's Amazing New AI

307 - Segment your images in python without training using Segment Anything Model (SAM)

Segment Anything by Meta Research: Image Segmentation with the Largest Dataset and Model Yet!

Segment Anything Model (SAM): Build Custom Image Segmentation Model Using YOLOv8 and SAM

Fast Segment Anything (FastSAM) vs SAM | Is it 50x faster?

Meta's 'Segment Anything' Model: Image Segmentation Overview

Grounded Segment Anything Grounded DINO SAM Stable Diffusion Collab Demo Segment with text inputs

331 - Fine-tune Segment Anything Model (SAM) using custom data

SAM 2 | Segment Anything Model 2

Segment Anything Model by Meta AI: An Image Segmentation Model

Segment Anything 2 (SAM 2) Ball Tracking and Real Time Code Demo!

Label Data with Segment Anything Model (SAM) in Roboflow

Segment Anything 2 (SAM 2): Meta AI's Newest Model | Community Q&A (Jul 30)

Segment Anything Model - A Promptable Segmentation System #Shorts

Segment Anything Model (SAM) - Open Sourced!

Unleashing the Power of SAM : Segment Anything Model by Meta AI Review

How to Use SAM - Segment Anything Model: A Step-by-Step Guide to Image and Video Segmentation

RUNNING SEGMENT ANYTHING MODEL FROM META ON MEDICAL IMAGING DATA | LIVE CODING DEEP LEARNING