SAM 3: Segment Anything with Concepts

Q: When Will SAM 3 Be Released?

SAM 3 was released by Meta on November 20th, 2025. Ultralytics support is in progress and will ship in an upcoming package update with full docs for predict mode and track mode.

Q: What Is Promptable Concept Segmentation (PCS)?

PCS is a new task introduced in SAM 3 that segments all instances of a visual concept in an image or video. Unlike traditional segmentation that targets a specific object instance, PCS finds every occurrence of a category. For example: See related background on object detection and instance segmentation.

Q: What datasets are used to train SAM 3?

SAM 3 is trained on the Segment Anything with Concepts (SA-Co) dataset: Training Data: Benchmark Data: This massive scale and diversity enables SAM 3's superior zero-shot generalization across open-vocabulary concepts.

Q: How does SAM 3 compare to YOLO11 for segmentation?

SAM 3 and YOLO11 serve different use cases: SAM 3 Advantages: YOLO11 Advantages: Recommendation:

Q: How accurate is SAM 3 compared to human performance?

On the SA-Co/Gold benchmark with triple human annotation: SAM 3 achieves strong performance approaching human-level accuracy on open-vocabulary concept segmentation, with the gap primarily on ambiguous or subjective concepts (e.g., "small window", "cozy room").

Released — Ultralytics integration in progress

Meta released SAM-3 on November 20th, 2025. Ultralytics is integrating the models now and will ship a package update with native support soon. In the meantime, you can follow the official SAM 3 README steps below to try the upstream release.

SAM 3 Overview

SAM 3 (Segment Anything Model 3) is Meta's released foundation model for Promptable Concept Segmentation (PCS). Building upon SAM 2, SAM 3 introduces a fundamentally new capability: detecting, segmenting, and tracking all instances of a visual concept specified by text prompts, image exemplars, or both. Unlike previous SAM versions that segment single objects per prompt, SAM 3 can find and segment every occurrence of a concept appearing anywhere in images or videos, aligning with open-vocabulary goals in modern instance segmentation.

Ultralytics is actively integrating SAM-3 into the ultralytics package. Until that release lands, you can experiment with the upstream Meta implementation using the official installation and usage steps below.

Overview

SAM 3 achieves a 2× performance gain over existing systems in Promptable Concept Segmentation while maintaining and improving SAM 2's capabilities for interactive visual segmentation. The model excels at open-vocabulary segmentation, allowing users to specify concepts using simple noun phrases (e.g., "yellow school bus", "striped cat") or by providing example images of the target object. These capabilities complement production-ready pipelines that rely on streamlined predict and track workflows.

SAM 3 Segmentation

What is Promptable Concept Segmentation (PCS)?

The PCS task takes a concept prompt as input and returns segmentation masks with unique identities for all matching object instances. Concept prompts can be:

Text: Simple noun phrases like "red apple" or "person wearing a hat", similar to zero-shot learning
Image exemplars: Bounding boxes around example objects (positive or negative) for fast generalization
Combined: Both text and image exemplars together for precise control

This differs from traditional visual prompts (points, boxes, masks) which segment only a single specific object instance, as popularized by the original SAM family.

Key Performance Metrics

Metric	SAM 3 Achievement
LVIS Zero-Shot Mask AP	47.0 (vs previous best 38.5, +22% improvement)
SA-Co Benchmark	2× better than existing systems
Inference Speed (H200 GPU)	30 ms per image with 100+ detected objects
Video Performance	Near real-time for ~5 concurrent objects
MOSEv2 VOS Benchmark	60.1 J&F (+25.5% over SAM 2.1, +17% over prior SOTA)
Interactive Refinement	+18.6 CGF1 improvement after 3 exemplar prompts
Human Performance Gap	Achieves 88% of estimated lower bound on SA-Co/Gold

For context on model metrics and trade-offs in production, see model evaluation insights and YOLO performance metrics.

Architecture

SAM 3 consists of a detector and tracker that share a Perception Encoder (PE) vision backbone. This decoupled design avoids task conflicts while enabling both image-level detection and video-level tracking, with an interface compatible with Ultralytics Python usage and CLI usage.

Core Components

Detector: DETR-based architecture for image-level concept detection
- Text encoder for noun phrase prompts
- Exemplar encoder for image-based prompts
- Fusion encoder to condition image features on prompts
- Novel presence head that decouples recognition ("what") from localization ("where")
- Mask head for generating instance segmentation masks
Tracker: Memory-based video segmentation inherited from SAM 2
- Prompt encoder, mask decoder, memory encoder
- Memory bank for storing object appearance across frames
- Temporal disambiguation aided by techniques like a Kalman filter in multi-object settings
Presence Token: A learned global token that predicts whether the target concept is present in the image/frame, improving detection by separating recognition from localization.

SAM 3 Architecture

Key Innovations

Decoupled Recognition and Localization: The presence head predicts concept presence globally, while proposal queries focus only on localization, avoiding conflicting objectives.
Unified Concept and Visual Prompts: Supports both PCS (concept prompts) and PVS (visual prompts like SAM 2's clicks/boxes) in a single model.
Interactive Exemplar Refinement: Users can add positive or negative image exemplars to iteratively refine results, with the model generalizing to similar objects rather than just correcting individual instances.
Temporal Disambiguation: Uses masklet detection scores and periodic re-prompting to handle occlusions, crowded scenes, and tracking failures in video, aligning with instance segmentation and tracking best practices.

SA-Co Dataset

SAM 3 is trained on Segment Anything with Concepts (SA-Co), Meta's largest and most diverse segmentation dataset to date, expanding beyond common benchmarks like COCO and LVIS.

Training Data

Dataset Component	Description	Scale
SA-Co/HQ	High-quality human-annotated image data from 4-phase data engine	5.2M images, 4M unique noun phrases
SA-Co/SYN	Synthetic dataset labeled by AI without human involvement	38M noun phrases, 1.4B masks
SA-Co/EXT	15 external datasets enriched with hard negatives	Varies by source
SA-Co/VIDEO	Video annotations with temporal tracking	52.5K videos, 24.8K unique noun phrases

Benchmark Data

The SA-Co evaluation benchmark contains 214K unique phrases across 126K images and videos, providing over 50× more concepts than existing benchmarks. It includes:

SA-Co/Gold: 7 domains, triple-annotated for measuring human performance bounds
SA-Co/Silver: 10 domains, single human annotation
SA-Co/Bronze and SA-Co/Bio: 9 existing datasets adapted for concept segmentation
SA-Co/VEval: Video benchmark with 3 domains (SA-V, YT-Temporal-1B, SmartGlasses)

Data Engine Innovations

SAM 3's scalable human- and model-in-the-loop data engine achieves 2× annotation throughput through:

AI Annotators: Llama based models propose diverse noun phrases including hard negatives
AI Verifiers: Fine-tuned multimodal LLMs verify mask quality and exhaustivity at near-human performance
Active Mining: Focuses human effort on challenging failure cases where AI struggles
Ontology-Driven: Leverages a large ontology grounded in Wikidata for concept coverage

Installation

SAM 3 will be available directly in the Ultralytics package once integration lands. Installation will remain:

pip install ultralytics

Models will download automatically when first used. You can then use standard predict mode and later export models to formats like ONNX and TensorRT for deployment. Watch for a package update with SAM-3 weights and configs soon.

How to Use SAM 3: Versatility in Concept Segmentation

Ultralytics API preview

The following examples show the intended Ultralytics API once SAM 3 ships in the package. Until integration lands, details may change.

Supported Tasks and Models

SAM 3 supports both Promptable Concept Segmentation (PCS) and Promptable Visual Segmentation (PVS) tasks:

Task Type	Prompt Types	Output
Concept Segmentation (PCS)	Text (noun phrases), image exemplars	All instances matching the concept
Visual Segmentation (PVS)	Points, boxes, masks	Single object instance (SAM 2 style)
Interactive Refinement	Add/remove exemplars or clicks iteratively	Refined segmentation with improved accuracy

Concept Segmentation Examples

Segment with Text Prompts

Text-based Concept Segmentation

Find and segment all instances of a concept using a text description.

PythonCLI

from ultralytics import SAM

# Load SAM 3 model
model = SAM("sam3.pt")

# Segment all instances of a concept
results = model("path/to/image.jpg", prompt="yellow school bus")

# Works with descriptive phrases
results = model("path/to/image.jpg", prompt="person wearing a red hat")

# Or simple object names
results = model("path/to/image.jpg", prompt="striped cat")

# Segment all matching concepts in an image
yolo segment model=sam3.pt source=path/to/image.jpg prompt="yellow school bus"

API Preview

This example shows intended usage. Actual implementation pending Ultralytics integration.

Segment with Image Exemplars

Image Exemplar-based Segmentation

Use one or more example objects to find all similar instances.

Python

from ultralytics import SAM

model = SAM("sam3.pt")

# Provide a positive example box - finds all similar objects
results = model("path/to/image.jpg", bboxes=[100, 150, 300, 400], labels=[1])

# Add negative examples to exclude certain instances
results = model(
    "path/to/image.jpg",
    bboxes=[[100, 150, 300, 400], [500, 200, 600, 350]],  # Two boxes
    labels=[1, 0],  # First is positive, second is negative
)

# Combine text and image exemplars for precision
results = model("path/to/image.jpg", prompt="dog", bboxes=[100, 150, 300, 400], labels=[1])

API Preview

This example shows intended usage. Actual implementation pending Ultralytics integration.

Iterative Refinement with Exemplars

Progressively improve results by adding exemplar prompts based on initial output.

Python

from ultralytics import SAM

model = SAM("sam3.pt")

# Initial segmentation with text
results = model("path/to/image.jpg", prompt="car")

# If some cars are missed, add a positive exemplar
results = model(
    "path/to/image.jpg",
    prompt="car",
    bboxes=[missed_car_box],
    labels=[1],  # Positive example
)

# If false positives appear, add negative exemplars
results = model(
    "path/to/image.jpg",
    prompt="car",
    bboxes=[false_positive_box],
    labels=[0],  # Negative example
)

API Preview

This example shows intended usage. Actual implementation pending Ultralytics integration.

Video Concept Segmentation

Track Concepts Across Video

Detect and track all instances of a concept throughout a video.

Python

from ultralytics.models.sam import SAM3VideoPredictor

# Create video predictor
predictor = SAM3VideoPredictor(model="sam3.pt", imgsz=1024, conf=0.25)

# Track all instances of a concept
results = predictor(source="video.mp4", prompt="person wearing blue shirt")

# Combine text with exemplar for precision
results = predictor(
    source="video.mp4",
    prompt="kangaroo",
    bboxes=[initial_box],  # Exemplar from first frame
    labels=[1],
)

API Preview

This example shows intended usage. Actual implementation pending Ultralytics integration.

For broader streaming and production setups, see object tracking and view results in terminal.

Visual Prompts (SAM 2 Compatibility)

SAM 3 maintains full backward compatibility with SAM 2's visual prompting:

SAM 2 Style Visual Prompts

Python

from ultralytics import SAM

model = SAM("sam3.pt")

# Single point prompt (SAM 2 style)
results = model(points=[900, 370], labels=[1])

# Multiple points
results = model(points=[[400, 370], [900, 370]], labels=[1, 1])

# Box prompt
results = model(bboxes=[100, 150, 300, 400])

API Preview

This example shows intended usage. Actual implementation pending Ultralytics integration.

Performance Benchmarks

Image Segmentation

SAM 3 achieves state-of-the-art results across multiple benchmarks, including real-world datasets like LVIS and COCO for segmentation:

Benchmark	Metric	SAM 3	Previous Best	Improvement
LVIS (zero-shot)	Mask AP	47.0	38.5	+22.1%
SA-Co/Gold	CGF1	65.0	34.3 (OWLv2)	+89.5%
COCO (zero-shot)	Box AP	53.5	52.2 (T-Rex2)	+2.5%
ADE-847 (semantic seg)	mIoU	14.7	9.2 (APE-D)	+59.8%
PascalConcept-59	mIoU	59.4	58.5 (APE-D)	+1.5%
Cityscapes (semantic seg)	mIoU	65.1	44.2 (APE-D)	+47.3%

Explore dataset options for quick experimentation in Ultralytics datasets.

Video Segmentation Performance

SAM 3 shows significant improvements over SAM 2 and prior state-of-the-art across video benchmarks such as DAVIS 2017 and YouTube-VOS:

Benchmark	Metric	SAM 3	SAM 2.1 L	Improvement
MOSEv2	J&F	60.1	47.9	+25.5%
DAVIS 2017	J&F	92.0	90.7	+1.4%
LVOSv2	J&F	88.2	79.6	+10.8%
SA-V	J&F	84.6	78.4	+7.9%
YTVOS19	J&F	89.6	89.3	+0.3%

Few-Shot Adaptation

SAM 3 excels at adapting to new domains with minimal examples, relevant for data-centric AI workflows:

Benchmark	0-shot AP	10-shot AP	Previous Best (10-shot)
ODinW13	59.9	71.6	67.9 (gDino1.5-Pro)
RF100-VL	14.3	35.7	33.7 (gDino-T)

SAM 3's concept-based prompting with exemplars converges much faster than visual prompting:

Prompts Added	CGF1 Score	Gain vs Text-Only	Gain vs PVS Baseline
Text only	46.4	baseline	baseline
+1 exemplar	57.6	+11.2	+6.7
+2 exemplars	62.2	+15.8	+9.7
+3 exemplars	65.0	+18.6	+11.2
+4 exemplars	65.7	+19.3	+11.5 (plateau)

Object Counting Accuracy

SAM 3 provides accurate counting by segmenting all instances, a common requirement in object counting:

Benchmark	Accuracy	MAE	vs Best MLLM
CountBench	95.6%	0.11	92.4% (Gemini 2.5)
PixMo-Count	87.3%	0.22	88.8% (Molmo-72B)

SAM 3 vs SAM 2 vs YOLO Comparison

Here we compare SAM 3's capabilities with SAM 2 and YOLO11 models:

Capability	SAM 3	SAM 2	YOLO11n-seg
Concept Segmentation	✅ All instances from text/exemplars	❌ Not supported	❌ Not supported
Visual Segmentation	✅ Single instance (SAM 2 compatible)	✅ Single instance	✅ All instances
Zero-shot Capability	✅ Open vocabulary	✅ Geometric prompts	❌ Closed set
Interactive Refinement	✅ Exemplars + clicks	✅ Clicks only	❌ Not supported
Video Tracking	✅ Multi-object with identities	✅ Multi-object	✅ Multi-object
LVIS Mask AP (zero-shot)	47.0	N/A	N/A
MOSEv2 J&F	60.1	47.9	N/A
Inference Speed (H200)	30 ms (100+ objects)	~23 ms (per object)	2-3 ms (image)
Model Size	Large (~400+ MB expected)	162 MB (base)	5.9 MB

Key Takeaways:

SAM 3: Best for open-vocabulary concept segmentation, finding all instances of a concept with text or exemplar prompts
SAM 2: Best for interactive single-object segmentation in images and videos with geometric prompts
YOLO11: Best for real-time, high-speed segmentation in resource-constrained deployments using efficient export pipelines like ONNX and TensorRT

Evaluation Metrics

SAM 3 introduces new metrics designed for the PCS task, complementing familiar measures like F1 score, precision, and recall.

Classification-Gated F1 (CGF1)

The primary metric combining localization and classification:

CGF1 = 100 × pmF1 × IL_MCC

Where:

pmF1 (Positive Macro F1): Measures localization quality on positive examples
IL_MCC (Image-Level Matthews Correlation Coefficient): Measures binary classification accuracy ("is the concept present?")

Why These Metrics?

Traditional AP metrics don't account for calibration, making models difficult to use in practice. By evaluating only predictions above 0.5 confidence, SAM 3's metrics enforce good calibration and mimic real-world usage patterns in interactive predict and track loops.

Key Ablations and Insights

Impact of Presence Head

The presence head decouples recognition from localization, providing significant improvements:

Configuration	CGF1	IL_MCC	pmF1
Without presence	57.6	0.77	74.7
With presence	63.3	0.82	77.1

The presence head provides a +5.7 CGF1 boost (+9.9%), primarily improving recognition ability (IL_MCC +6.5%).

Effect of Hard Negatives

Hard Negatives/Image	CGF1	IL_MCC	pmF1
0	31.8	0.44	70.2
5	44.8	0.62	71.9
30	49.2	0.68	72.3

Hard negatives are crucial for open-vocabulary recognition, improving IL_MCC by 54.5% (0.44 → 0.68).

Training Data Scaling

Data Sources	CGF1	IL_MCC	pmF1
External only	30.9	0.46	66.3
External + Synthetic	39.7	0.57	70.6
External + HQ	51.8	0.71	73.2
All three	54.3	0.74	73.5

High-quality human annotations provide large gains over synthetic or external data alone. For background on data quality practices, see data collection and annotation.

Applications

SAM 3's concept segmentation capability enables new use cases:

Content Moderation: Find all instances of specific content types across media libraries
E-commerce: Segment all products of a certain type in catalog images, supporting auto-annotation
Medical Imaging: Identify all occurrences of specific tissue types or abnormalities
Autonomous Systems: Track all instances of traffic signs, pedestrians, or vehicles by category
Video Analytics: Count and track all people wearing specific clothing or performing actions
Dataset Annotation: Rapidly annotate all instances of rare object categories
Scientific Research: Quantify and analyze all specimens matching specific criteria

SAM 3 Agent: Extended Language Reasoning

SAM 3 can be combined with Multimodal Large Language Models (MLLMs) to handle complex queries requiring reasoning, similar in spirit to open-vocabulary systems like OWLv2 and T-Rex.

Performance on Reasoning Tasks

Benchmark	Metric	SAM 3 Agent (Gemini 2.5 Pro)	Previous Best
ReasonSeg (validation)	gIoU	76.0	65.0 (SoTA)
ReasonSeg (test)	gIoU	73.8	61.3 (SoTA)
OmniLabel (validation)	AP	46.7	36.5 (REAL)
RefCOCO+	Acc	91.2	89.3 (LISA)

Example Complex Queries

SAM 3 Agent can handle queries requiring reasoning:

"People sitting down but not holding a gift box in their hands"
"The dog closest to the camera that is not wearing a collar"
"Red objects larger than the person's hand"

The MLLM proposes simple noun phrase queries to SAM 3, analyzes returned masks, and iterates until satisfied.

Limitations

While SAM 3 represents a major advancement, it has certain limitations:

Phrase Complexity: Best suited for simple noun phrases; long referring expressions or complex reasoning may require MLLM integration
Ambiguity Handling: Some concepts remain inherently ambiguous (e.g., "small window", "cozy room")
Computational Requirements: Larger and slower than specialized detection models like YOLO
Vocabulary Scope: Focused on atomic visual concepts; compositional reasoning is limited without MLLM assistance
Rare Concepts: Performance may degrade on extremely rare or fine-grained concepts not well-represented in training data

Citation

BibTeX

@inproceedings{sam3_2025,
  title     = {SAM 3: Segment Anything with Concepts},
  author    = {Anonymous authors},
  booktitle = {Submitted to ICLR 2026},
  year      = {2025},
  url       = {https://openreview.net/forum?id=r35clVtGzw},
  note      = {Paper ID: 4183, under double-blind review}
}

FAQ

When Will SAM 3 Be Released?

SAM 3 was released by Meta on November 20th, 2025. Ultralytics support is in progress and will ship in an upcoming package update with full docs for predict mode and track mode.

Will SAM 3 Be Integrated Into Ultralytics?

Yes. SAM 3 will be supported in the Ultralytics Python package upon release, including concept segmentation, SAM 2–style visual prompts, and multi-object video tracking. You will be able to export to formats like ONNX and TensorRT for deployment, with streamlined Python and CLI workflows.

Implementation Timeline

Code examples in this documentation are preview versions showing intended usage patterns. Actual implementation will be available after Ultralytics completes integration.

What Is Promptable Concept Segmentation (PCS)?

PCS is a new task introduced in SAM 3 that segments all instances of a visual concept in an image or video. Unlike traditional segmentation that targets a specific object instance, PCS finds every occurrence of a category. For example:

Text prompt: "yellow school bus" → segments all yellow school buses in the scene
Image exemplar: Box around one dog → segments all dogs in the image
Combined: "striped cat" + exemplar box → segments all striped cats matching the example

See related background on object detection and instance segmentation.

How Does SAM 3 Differ From SAM 2?

Feature	SAM 2	SAM 3
Task	Single object per prompt	All instances of a concept
Prompt Types	Points, boxes, masks	+ Text phrases, image exemplars
Detection Capability	Requires external detector	Built-in open-vocabulary detector
Recognition	Geometry-based only	Text and visual recognition
Architecture	Tracker only	Detector + Tracker with presence head
Zero-Shot Performance	N/A (requires visual prompts)	47.0 AP on LVIS, 2× better on SA-Co
Interactive Refinement	Clicks only	Clicks + exemplar generalization

SAM 3 maintains backward compatibility with SAM 2 visual prompting while adding concept-based capabilities.

What datasets are used to train SAM 3?

SAM 3 is trained on the Segment Anything with Concepts (SA-Co) dataset:

Training Data:

5.2M images with 4M unique noun phrases (SA-Co/HQ) - high-quality human annotations
52.5K videos with 24.8K unique noun phrases (SA-Co/VIDEO)
1.4B synthetic masks across 38M noun phrases (SA-Co/SYN)
15 external datasets enriched with hard negatives (SA-Co/EXT)

Benchmark Data:

214K unique concepts across 126K images/videos
50× more concepts than existing benchmarks (e.g., LVIS has ~4K concepts)
Triple annotation on SA-Co/Gold for measuring human performance bounds

This massive scale and diversity enables SAM 3's superior zero-shot generalization across open-vocabulary concepts.

How does SAM 3 compare to YOLO11 for segmentation?

SAM 3 and YOLO11 serve different use cases:

SAM 3 Advantages:

Open-vocabulary: Segments any concept via text prompts without training
Zero-shot: Works on new categories immediately
Interactive: Exemplar-based refinement generalizes to similar objects
Concept-based: Automatically finds all instances of a category
Accuracy: 47.0 AP on LVIS zero-shot instance segmentation

YOLO11 Advantages:

Speed: 10-15× faster inference (2-3ms vs 30ms per image)
Efficiency: 70× smaller models (5.9MB vs ~400MB expected)
Resource-friendly: Runs on edge devices and mobile
Real-time: Optimized for production deployments

Recommendation:

Use SAM 3 for flexible, open-vocabulary segmentation where you need to find all instances of concepts described by text or examples
Use YOLO11 for high-speed, production deployments where categories are known in advance
Use SAM 2 for interactive single-object segmentation with geometric prompts

Can SAM 3 handle complex language queries?

SAM 3 is designed for simple noun phrases (e.g., "red apple", "person wearing hat"). For complex queries requiring reasoning, combine SAM 3 with an MLLM as SAM 3 Agent:

Simple queries (native SAM 3):

"yellow school bus"
"striped cat"
"person wearing red hat"

Complex queries (SAM 3 Agent with MLLM):

"People sitting down but not holding a gift box"
"The dog closest to the camera without a collar"
"Red objects larger than the person's hand"

SAM 3 Agent achieves 76.0 gIoU on ReasonSeg validation (vs 65.0 previous best, +16.9% improvement) by combining SAM 3's segmentation with MLLM reasoning capabilities.

How accurate is SAM 3 compared to human performance?

On the SA-Co/Gold benchmark with triple human annotation:

Human lower bound: 74.2 CGF1 (most conservative annotator)
SAM 3 performance: 65.0 CGF1
Achievement: 88% of estimated human lower bound
Human upper bound: 81.4 CGF1 (most liberal annotator)

SAM 3 achieves strong performance approaching human-level accuracy on open-vocabulary concept segmentation, with the gap primarily on ambiguous or subjective concepts (e.g., "small window", "cozy room").

📅 Created 1 month ago ✏️ Updated 1 day ago

SAM 3: Segment Anything with Concepts

Overview

What is Promptable Concept Segmentation (PCS)?

Key Performance Metrics

Architecture

Core Components

Key Innovations

SA-Co Dataset

Training Data

Benchmark Data

Data Engine Innovations

Installation

How to Use SAM 3: Versatility in Concept Segmentation

Supported Tasks and Models

Concept Segmentation Examples

Segment with Text Prompts

Segment with Image Exemplars

Interactive Refinement

Video Concept Segmentation

Visual Prompts (SAM 2 Compatibility)

Performance Benchmarks

Image Segmentation

Video Segmentation Performance

Few-Shot Adaptation

Interactive Refinement Effectiveness

Object Counting Accuracy

SAM 3 vs SAM 2 vs YOLO Comparison

Evaluation Metrics

Classification-Gated F1 (CGF1)

Why These Metrics?

Key Ablations and Insights

Impact of Presence Head

Effect of Hard Negatives

Training Data Scaling

Applications

SAM 3 Agent: Extended Language Reasoning

Performance on Reasoning Tasks

Example Complex Queries

Limitations

Citation

FAQ

When Will SAM 3 Be Released?

Will SAM 3 Be Integrated Into Ultralytics?

What Is Promptable Concept Segmentation (PCS)?

How Does SAM 3 Differ From SAM 2?

What datasets are used to train SAM 3?

How does SAM 3 compare to YOLO11 for segmentation?

Can SAM 3 handle complex language queries?

How accurate is SAM 3 compared to human performance?

Comments