Link to this sectionSegment Anything Model (SAM)#

SAM Evolution

This is the original SAM model from Meta. For improved capabilities, see SAM 2 for video segmentation or SAM 3 for Promptable Concept Segmentation with text and image exemplar prompts.

Welcome to the frontier of image segmentation with the Segment Anything Model, or SAM. This revolutionary model has changed the game by introducing promptable image segmentation with real-time performance, setting new standards in the field.

Link to this sectionIntroduction to SAM: The Segment Anything Model#

The Segment Anything Model, or SAM, is a cutting-edge image segmentation model that allows for promptable segmentation, providing unparalleled versatility in image analysis tasks. SAM forms the heart of the Segment Anything initiative, a groundbreaking project that introduces a novel model, task, and dataset for image segmentation.

SAM's advanced design allows it to adapt to new image distributions and tasks without prior knowledge, a feature known as zero-shot transfer. Trained on the expansive SA-1B dataset, which contains more than 1 billion masks spread over 11 million carefully curated images, SAM has displayed impressive zero-shot performance, surpassing previous fully supervised results in many cases.

SA-1B dataset sample with automatic segmentation masks SA-1B Example images. Dataset images overlaid masks from the newly introduced SA-1B dataset. SA-1B contains 11M diverse, high-resolution, licensed, and privacy protecting images and 1.1B high-quality segmentation masks. These masks were annotated fully automatically by SAM, and as verified by human ratings and numerous experiments, are of high quality and diversity. Images are grouped by number of masks per image for visualization (there are ∼100 masks per image on average).

Link to this sectionKey Features of the Segment Anything Model (SAM)#

Promptable Segmentation Task: SAM was designed with a promptable segmentation task in mind, allowing it to generate valid segmentation masks from any given prompt, such as spatial or text clues identifying an object.
Advanced Architecture: The Segment Anything Model employs a powerful image encoder, a prompt encoder, and a lightweight mask decoder. This unique architecture enables flexible prompting, real-time mask computation, and ambiguity awareness in segmentation tasks.
The SA-1B Dataset: Introduced by the Segment Anything project, the SA-1B dataset features over 1 billion masks on 11 million images. As the largest segmentation dataset to date, it provides SAM with a diverse and large-scale training data source.
Zero-Shot Performance: SAM displays outstanding zero-shot performance across various segmentation tasks, making it a ready-to-use tool for diverse applications with minimal need for prompt engineering.

For an in-depth look at the Segment Anything Model and the SA-1B dataset, please visit the Segment Anything GitHub and check out the research paper Segment Anything.

SAM on Ultralytics Platform

SAM powers the smart annotation feature on Ultralytics Platform, enabling click-based intelligent masking for fast dataset labeling. See the annotation guide for details.

Link to this sectionAvailable Models, Supported Tasks, and Operating Modes#

This table presents the available models with their specific pretrained weights, the tasks they support, and their compatibility with different operating modes like Inference, Validation, Training, and Export, indicated by ✅ emojis for supported modes and ❌ emojis for unsupported modes.

Model Type	Pretrained Weights	Tasks Supported	Inference	Validation	Training	Export
SAM base	sam_b.pt	Instance Segmentation	✅	❌	❌	❌
SAM large	sam_l.pt	Instance Segmentation	✅	❌	❌	❌

Link to this sectionHow to Use SAM: Versatility and Power in Image Segmentation#

The Segment Anything Model can be employed for a multitude of downstream tasks that go beyond its training data. This includes edge detection, object proposal generation, instance segmentation, and preliminary text-to-mask prediction. With prompt engineering, SAM can swiftly adapt to new tasks and data distributions in a zero-shot manner, establishing it as a versatile and potent tool for all your image segmentation needs.

Link to this sectionSAM prediction example#

Segment with prompts

Segment image with given prompts.

from ultralytics import SAM

# Load a model
model = SAM("sam_b.pt")

# Display model information (optional)
model.info()

# Run inference with bboxes prompt
results = model("ultralytics/assets/zidane.jpg", bboxes=[439, 437, 524, 709])

# Run inference with single point
results = model(points=[900, 370], labels=[1])

# Run inference with multiple points
results = model(points=[[400, 370], [900, 370]], labels=[1, 1])

# Run inference with multiple points prompt per object
results = model(points=[[[400, 370], [900, 370]]], labels=[[1, 1]])

# Run inference with negative points prompt
results = model(points=[[[400, 370], [900, 370]]], labels=[[1, 0]])

Segment everything

Segment the whole image.

from ultralytics import SAM

# Load a model
model = SAM("sam_b.pt")

# Display model information (optional)
model.info()

# Run inference
model("path/to/image.jpg")

The logic here is to segment the whole image if you don't pass any prompts (bboxes/points/masks).

SAMPredictor example

This way you can set image once and run prompts inference multiple times without running image encoder multiple times.

import cv2

from ultralytics.models.sam import Predictor as SAMPredictor

# Create SAMPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="mobile_sam.pt")
predictor = SAMPredictor(overrides=overrides)

# Set image
predictor.set_image("ultralytics/assets/zidane.jpg")  # set with image file
predictor.set_image(cv2.imread("ultralytics/assets/zidane.jpg"))  # set with np.ndarray
results = predictor(bboxes=[439, 437, 524, 709])

# Run inference with single point prompt
results = predictor(points=[900, 370], labels=[1])

# Run inference with multiple points prompt
results = predictor(points=[[400, 370], [900, 370]], labels=[1, 1])

# Run inference with negative points prompt
results = predictor(points=[[[400, 370], [900, 370]]], labels=[[1, 0]])

# Reset image
predictor.reset_image()

Segment everything with additional args.

from ultralytics.models.sam import Predictor as SAMPredictor

# Create SAMPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="mobile_sam.pt")
predictor = SAMPredictor(overrides=overrides)

# Segment with additional args
results = predictor(source="ultralytics/assets/zidane.jpg", crop_n_layers=1, points_stride=64)

Note

All the returned results in the above examples are Results objects which allow access to predicted masks and source image easily.

More additional args for Segment everything see Predictor/generate Reference.

Link to this sectionSAM Comparison vs YOLO#

Here we compare Meta's SAM-b model with Ultralytics segmentation models including YOLO26n-seg:

Model	Size ^(MB)	Parameters ^(M)	Speed (CPU) ^(ms/im)
Meta SAM-b	375	93.7	41703
MobileSAM	40.7	10.1	23802
FastSAM-s with YOLOv8 backbone	23.9	11.8	58.0
Ultralytics YOLOv8n-seg	7.1 (52.8x smaller)	3.4 (27.6x less)	24.8 (1682x faster)
Ultralytics YOLO11n-seg	6.2 (60.5x smaller)	2.9 (32.3x less)	24.3 (1716x faster)
Ultralytics YOLO26n-seg	6.7 (56.0x smaller)	2.7 (34.7x less)	25.2 (1655x faster)

This comparison demonstrates the substantial differences in model sizes and speeds between SAM variants and YOLO segmentation models. While SAM provides unique automatic segmentation capabilities, YOLO models, particularly YOLOv8n-seg, YOLO11n-seg and YOLO26n-seg, are significantly smaller, faster, and more computationally efficient.

SAM speeds measured with PyTorch, YOLO speeds measured with ONNX Runtime. Tests run on a 2025 Apple M4 Air with 16GB of RAM using torch==2.10.0, ultralytics==8.4.31, and onnxruntime==1.24.4. To reproduce this test:

Example

from ultralytics import ASSETS, SAM, YOLO, FastSAM

# Profile SAM-b, MobileSAM
for file in ["sam_b.pt", "mobile_sam.pt"]:
    model = SAM(file)
    model.info()
    model(ASSETS)

# Profile FastSAM-s
model = FastSAM("FastSAM-s.pt")
model.info()
model(ASSETS)

# Profile YOLO models (ONNX)
for file_name in ["yolov8n-seg.pt", "yolo11n-seg.pt", "yolo26n-seg.pt"]:
    model = YOLO(file_name)
    model.info()
    onnx_path = model.export(format="onnx", dynamic=True)
    model = YOLO(onnx_path)
    model(ASSETS)

Link to this sectionAuto-Annotation: A Quick Path to Segmentation Datasets#

Auto-annotation is a key feature of SAM, allowing users to generate a segmentation dataset using a pretrained detection model. This feature enables rapid and accurate annotation of a large number of images, bypassing the need for time-consuming manual labeling.

Link to this sectionGenerate Your Segmentation Dataset Using a Detection Model#

To auto-annotate your dataset with the Ultralytics framework, use the auto_annotate function as shown below:

Example

from ultralytics.data.annotator import auto_annotate

auto_annotate(data="path/to/images", det_model="yolo26x.pt", sam_model="sam_b.pt")

Argument	Type	Default	Description
`data`	`str`	required	Path to directory containing target images for annotation or segmentation.
`det_model`	`str`	`'yolo26x.pt'`	YOLO detection model path for initial object detection.
`sam_model`	`str`	`'sam_b.pt'`	SAM model path for segmentation (supports SAM, SAM 2, MobileSAM, and SAM 3 weights).
`device`	`str`	`''`	Computation device (e.g., 'cuda:0', 'cpu', or '' for automatic device detection).
`conf`	`float`	`0.25`	YOLO detection confidence threshold for filtering weak detections.
`iou`	`float`	`0.45`	IoU threshold for Non-Maximum Suppression to filter overlapping boxes.
`imgsz`	`int`	`640`	Input size for resizing images (must be multiple of 32).
`max_det`	`int`	`300`	Maximum number of detections per image for memory efficiency.
`classes`	`list[int]`	`None`	List of class indices to detect (e.g., `[0, 1]` for person & bicycle).
`output_dir`	`str`	`None`	Save directory for annotations (default: `<data>_auto_annotate_labels` sibling).

The auto_annotate function takes the path to your images, with optional arguments for specifying the pretrained detection and SAM segmentation models, the device to run the models on, and the output directory for saving the annotated results.

Auto-annotation with pretrained models can dramatically cut down the time and effort required for creating high-quality segmentation datasets. This feature is especially beneficial for researchers and developers dealing with large image collections, as it allows them to focus on model development and evaluation rather than manual annotation.

Link to this sectionCitations and Acknowledgments#

If you find SAM useful in your research or development work, please consider citing our paper:

Quote

@misc{kirillov2023segment,
      title={Segment Anything},
      author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
      year={2023},
      eprint={2304.02643},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

We would like to express our gratitude to Meta AI for creating and maintaining this valuable resource for the computer vision community.

Link to this sectionFAQ#

Link to this sectionWhat is the Segment Anything Model (SAM) by Ultralytics?#

The Segment Anything Model (SAM) by Ultralytics is a revolutionary image segmentation model designed for promptable segmentation tasks. It leverages advanced architecture, including image and prompt encoders combined with a lightweight mask decoder, to generate high-quality segmentation masks from various prompts such as spatial or text cues. Trained on the expansive SA-1B dataset, SAM excels in zero-shot performance, adapting to new image distributions and tasks without prior knowledge.

Link to this sectionHow can I use the Segment Anything Model (SAM) for image segmentation?#

You can use the Segment Anything Model (SAM) for image segmentation by running inference with various prompts such as bounding boxes or points. Here's an example using Python:

from ultralytics import SAM

# Load a model
model = SAM("sam_b.pt")

# Segment with bounding box prompt
model("ultralytics/assets/zidane.jpg", bboxes=[439, 437, 524, 709])

# Segment with points prompt
model("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])

# Segment with multiple points prompt
model("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])

# Segment with multiple points prompt per object
model("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])

# Segment with negative points prompt.
model("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])

Alternatively, you can run inference with SAM in the command line interface (CLI):

yolo predict model=sam_b.pt source=path/to/image.jpg

For more detailed usage instructions, visit the Segmentation section.

Link to this sectionHow do SAM and YOLO models compare in terms of performance?#

Compared to YOLO models, SAM variants like SAM-b, MobileSAM, and FastSAM-s are typically larger and slower but offer unique zero-shot segmentation capabilities. For example, YOLO26n-seg is 56x smaller and over 1650x faster than Meta's original SAM-b model on CPU. This makes YOLO models ideal for applications requiring rapid, lightweight, and computationally efficient segmentation, while SAM models excel in flexible, promptable, and zero-shot segmentation tasks.

Link to this sectionHow can I auto-annotate my dataset using SAM?#

Ultralytics' SAM offers an auto-annotation feature that allows generating segmentation datasets using a pretrained detection model. Here's an example in Python:

from ultralytics.data.annotator import auto_annotate

auto_annotate(data="path/to/images", det_model="yolo26x.pt", sam_model="sam_b.pt")

This function takes the path to your images and optional arguments for pretrained detection and SAM segmentation models, along with device and output directory specifications. For a complete guide, see Auto-Annotation.

Link to this sectionWhat datasets are used to train the Segment Anything Model (SAM)?#

SAM is trained on the extensive SA-1B dataset which comprises over 1 billion masks across 11 million images. SA-1B is the largest segmentation dataset to date, providing high-quality and diverse training data, ensuring impressive zero-shot performance in varied segmentation tasks. For more details, visit the Dataset section.

Contributors

GLglenn-jocher²⁹ RIRizwanMunawar⁷ LALaughing-q⁵ RAraimbekovm³ HIhim2him2¹ PDpderrenger¹ ONonuralpszr¹ Y-Y-T-G¹ MAMatthewNoyce¹ BUBurhan-Q¹

Created Nov 12, 2023Updated 1 month ago