331 - Fine-tune Segment Anything Model (SAM) using custom data

Показать описание

This tutorial walks you through the process of fine-tuning a Segment Anything Model (SAM) using custom data.

What is SAM?
SAM is an image segmentation model developed by Meta AI. It was trained over 11 billion segmentation masks from millions of images. It is designed to take human prompts, in the form of points, bounding boxes or even a text prompt describing what should be segmented.

What are the key features of SAM?
Zero-shot generalization: SAM can be used to segment objects that it has never seen before, without the need for additional training.

Flexible prompting: SAM can be prompted with a variety of input, including points, boxes, and text descriptions.

Real-time mask computation: SAM can generate masks for objects in real time. This makes SAM ideal for applications where it is necessary to segment objects quickly, such as autonomous driving and robotics.

Ambiguity awareness: SAM is aware of the ambiguity of objects in images. This means that SAM can generate masks for objects even when they are partially occluded or overlapping with other objects.

How does SAM work?
SAM works by first encoding the image into a high-dimensional vector representation. The prompt is encoded into a separate vector representation. The two vector representations are then combined and passed to a mask decoder, which outputs a mask for the object specified by the prompt.

The image encoder is a vision transformer (ViT-H) model, which is a large language model that has been pre-trained on a massive dataset of images. The prompt encoder is a simple text encoder that converts the input prompt into a vector representation. The mask decoder is a lightweight transformer model that predicts the object mask from the image and prompt embeddings.

Courtesy: EPFL

Рекомендации по теме

Комментарии

great video, Now we are waiting for SAM2 using custom data

yourgo

Great video as always. I think the function to find bboxes might be improved to take care of the fact that you might have multiple objects in a patch (I guess you could do a simple watershed and then find min and max for each instance). Also I'm wondering if you could improve results by adding some heuristics to how you choose your grid points, for instance concentrating points in darker areas in this case?

NicolaRomano

Hey man, nice job, u e amazing like a what. I have got a problem in 26:00 min in video, in that 'example' i have an error that says, if anyone can help me, i really appreciate that. this is the last part of ERROR:

...raise ValueError(f"Unsupported number of image dimensions: {image.ndim}")
ValueError: Unsupported number of image dimensions: 2

mmd_punisher

Thank you for the video, your videos are always helpful! I'm facing this error and can't find a solution. In block 16, when accessing 'train_dataset[0]', I encounter the error: 'ValueError: Unsupported number of image dimensions: 2'.
Skipping the block doesn't help as the same error occurs during training. I've searched online but couldn't find anything useful.
I'm using Google Colab and these library versions: transformers 4.39.0.dev0, torch 2.1.0+cu121, datasets 2.18.0.
I would greatly appreciate it if you could help me solve this problem. Thanks in advance.

perpython

Excellent tutorial Sreeni!!! 👏👏Thank you so much!!!

kevian

Could you make a video on how to use the SAM image encoder only as a feature extractor and then use any other decoder to get the prediction mask?

mahmoudman

Your videos are so good.. please post a video on deep image prior..

Thanks

AnusuyaT-gzzc

Thank you very much for such a wonderful tutorial!!!

dmitryutkin

Hello Sreeni, first of I really enjoy your videos and they are really awesome. I was trying to re-run the code you have but I am facing to an issue on the line where you have example = train_dataset[0]. I get the following error: ValueError: Unsupported number of image dimensions: 2. is there any package I am missing? your help would be appreciated.

mehrdadpasha

Great video, and great instructor. However...
This get_bounding_box is not very good for multiple objects. Furthermore, I could not make it work for more than one bounding box as a prompt. Do you have an idea how to generalize it?

gabrielgcarvalho

Thanks for this amazing share.
Is there any possibility SAM output the label associated with predicted mask in order to know the name of the instance segmented using SAM please?
Thanks in advance

Azerty-vz

This is great thanks a lot ! However, since you deleted the images with empty masks, this means that this can work only for images where there are mitochondria. Could this be extended so that the model returns an empty mask when there is no mito ? (or other things for other applications)

juliannad

When changing patch_size from 256 to 512 and step size from 256 to 512 I get this error:

"Error: AssertionError: ground truth has different shape (torch.Size([2, 1, 512, 512])) from input (torch.Size([2, 1, 256, 256]))"

Why is this?

johanhaggle

Is there a way that we can use SAM for an image sequence? I'm trying to segment grains and pore area for small sand.

KennethSu-ey

May I know where is the 12 images tif? the website only gives us two sets of tif, each have 165 images

billlee

Hi i have used your code in order to fine tune sam in order to segment aerial images, but when i use my finetunedsam.pth it doesn’t even segment the images that it used to segment with no finetuning, what do you think is the problem ? Thank you in advance !!

manalkamal

Hi, good content. How can we train overlapping case? Train with one box and it's segment mask at a time? Or can we train with all boxes at a time utilising three output channels?

jerinantony

How to make a tif file for images and masks if I have custom data to train or is there any work around to train the model on custom data?

phoenix

Great tutorial as always Sreeni, thank you, There is a project called medical SAM, that is already custom training with thousands of medical images, to check it out. In social media you have mentioned a tutorial to pass from binary image to polygon masks. Is there any resource that I can base myself on to do this process?

urzdvd

if we already have prompt(mask) for test image as an input, why we use SAM to get the mask ? I mean - we already have an answer, how using SAM will help us?

timanb

331 - Fine-tune Segment Anything Model (SAM) using custom data

331 - Fine-tune Segment Anything Model (SAM) using custom data

Fine Tuning Segment Anything + Scroll Prize Part IX

Segment Anything Model (SAM): Build Custom Image Segmentation Model Using YOLOv8 and SAM

Segment Anything - Model explanation with code

Meet SAM… Meta’s latest AI model

307 - Segment your images in python without training using Segment Anything Model (SAM)

Explaining the Segment Anything Model - Network architecture, Dataset, Training

SAM - Segment Anything Model by Meta AI: Complete Guide | Python Setup & Applications

Segment Anything by Meta Research: Image Segmentation with the Largest Dataset and Model Yet!

Segment Anything Model (SAM) Breakdown | Computer Vision Breakthrough

Label Studio Segment Anything Model Integration

Personalize Segment Anything Model with One Shot

How to Fine Tune Foundation Models to Auto-Label Training Data

Custom Image Segmentation using Yolo V8 + Segment Anything Model (SAM)

Segment Anything Model - A Promptable Segmentation System #Shorts

330 - Fine tuning Detectron2 for instance segmentation using custom data

Segment Anything Model by Meta AI: An Image Segmentation Model

Segment Anything (Meta's Segmentation Model)

How to automate labeling with Segment Anything (SAM) in Encord

The Segment Anything Model is here

Segment Anything + ControlNet + Stable Diffusion = 💥

Segmenting Satellite Imagery with the Segment Anything Model (SAM)

Fast Segment Anything (FastSAM) vs SAM | Is it 50x faster?

Segment Anything Demo