Skip to content

MobileSAM Logo

Mobile Segment Anything (MobileSAM)

The MobileSAM paper is now available on arXiv.

A demonstration of MobileSAM running on a CPU can be accessed at this demo link. The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively.



Watch: How to Run Inference with MobileSAM using Ultralytics | Step-by-Step Guide 🎉

MobileSAM is implemented in various projects including Grounding-SAM, AnyLabeling, and Segment Anything in 3D.

MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future.

Available Models, Supported Tasks, and Operating Modes

This table presents the available models with their specific pre-trained weights, the tasks they support, and their compatibility with different operating modes like Inference, Validation, Training, and Export, indicated by ✅ emojis for supported modes and ❌ emojis for unsupported modes.

Model Type Pre-trained Weights Tasks Supported Inference Validation Training Export
MobileSAM mobile_sam.pt Instance Segmentation

Adapting from SAM to MobileSAM

Since MobileSAM retains the same pipeline as the original SAM, we have incorporated the original's pre-processing, post-processing, and all other interfaces. Consequently, those currently using the original SAM can transition to MobileSAM with minimal effort.

MobileSAM performs comparably to the original SAM and retains the same pipeline except for a change in the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a smaller Tiny-ViT (5M). On a single GPU, MobileSAM operates at about 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.

The following table provides a comparison of ViT-based image encoders:

Image Encoder Original SAM MobileSAM
Parameters 611M 5M
Speed 452ms 8ms

Both the original SAM and MobileSAM utilize the same prompt-guided mask decoder:

Mask Decoder Original SAM MobileSAM
Parameters 3.876M 3.876M
Speed 4ms 4ms

Here is the comparison of the whole pipeline:

Whole Pipeline (Enc+Dec) Original SAM MobileSAM
Parameters 615M 9.66M
Speed 456ms 12ms

The performance of MobileSAM and the original SAM are demonstrated using both a point and a box as prompts.

Image with Point as Prompt

Image with Box as Prompt

With its superior performance, MobileSAM is approximately 5 times smaller and 7 times faster than the current FastSAM. More details are available at the MobileSAM project page.

Testing MobileSAM in Ultralytics

Just like the original SAM, we offer a straightforward testing method in Ultralytics, including modes for both Point and Box prompts.

Model Download

You can download the model here.

Point Prompt

Example

from ultralytics import SAM

# Load the model
model = SAM("mobile_sam.pt")

# Predict a segment based on a single point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])

# Predict multiple segments based on multiple points prompt
model.predict("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])

# Predict a segment based on multiple points prompt per object
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])

# Predict a segment using both positive and negative prompts.
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])

Box Prompt

Example

from ultralytics import SAM

# Load the model
model = SAM("mobile_sam.pt")

# Predict a segment based on a single point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])

# Predict mutiple segments based on multiple points prompt
model.predict("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])

# Predict a segment based on multiple points prompt per object
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])

# Predict a segment using both positive and negative prompts.
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])

We have implemented MobileSAM and SAM using the same API. For more usage information, please see the SAM page.

Automatically Build Segmentation Datasets Leveraging a Detection Model

To automatically annotate your dataset using the Ultralytics framework, utilize the auto_annotate function as demonstrated below:

Example

from ultralytics.data.annotator import auto_annotate

auto_annotate(data="path/to/images", det_model="yolo11x.pt", sam_model="mobile_sam.pt")
Argument Type Default Description
data str required Path to directory containing target images/videos for annotation or segmentation.
det_model str "yolo11x.pt" YOLO detection model path for initial object detection.
sam_model str "sam2_b.pt" SAM2 model path for segmentation (supports t/s/b/l variants and SAM2.1) and mobile_sam models.
device str "" Computation device (e.g., 'cuda:0', 'cpu', or '' for automatic device detection).
conf float 0.25 YOLO detection confidence threshold for filtering weak detections.
iou float 0.45 IoU threshold for Non-Maximum Suppression to filter overlapping boxes.
imgsz int 640 Input size for resizing images (must be multiple of 32).
max_det int 300 Maximum number of detections per image for memory efficiency.
classes list[int] None List of class indices to detect (e.g., [0, 1] for person & bicycle).
output_dir str None Save directory for annotations (defaults to './labels' relative to data path).

Citations and Acknowledgements

If you find MobileSAM useful in your research or development work, please consider citing our paper:

@article{mobile_sam,
  title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
  author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
  journal={arXiv preprint arXiv:2306.14289},
  year={2023}
}

FAQ

What is MobileSAM and how does it differ from the original SAM model?

MobileSAM is a lightweight, fast image segmentation model designed for mobile applications. It retains the same pipeline as the original SAM but replaces the heavyweight ViT-H encoder (632M parameters) with a smaller Tiny-ViT encoder (5M parameters). This change results in MobileSAM being approximately 5 times smaller and 7 times faster than the original SAM. For instance, MobileSAM operates at about 12ms per image, compared to the original SAM's 456ms. You can learn more about the MobileSAM implementation in various projects here.

How can I test MobileSAM using Ultralytics?

Testing MobileSAM in Ultralytics can be accomplished through straightforward methods. You can use Point and Box prompts to predict segments. Here's an example using a Point prompt:

from ultralytics import SAM

# Load the model
model = SAM("mobile_sam.pt")

# Predict a segment based on a point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])

You can also refer to the Testing MobileSAM section for more details.

Why should I use MobileSAM for my mobile application?

MobileSAM is ideal for mobile applications due to its lightweight architecture and fast inference speed. Compared to the original SAM, MobileSAM is approximately 5 times smaller and 7 times faster, making it suitable for environments where computational resources are limited. This efficiency ensures that mobile devices can perform real-time image segmentation without significant latency. Additionally, MobileSAM's models, such as Inference, are optimized for mobile performance.

How was MobileSAM trained, and is the training code available?

MobileSAM was trained on a single GPU with a 100k dataset, which is 1% of the original images, in less than a day. While the training code will be made available in the future, you can currently explore other aspects of MobileSAM in the MobileSAM GitHub repository. This repository includes pre-trained weights and implementation details for various applications.

What are the primary use cases for MobileSAM?

MobileSAM is designed for fast and efficient image segmentation in mobile environments. Primary use cases include:

  • Real-time object detection and segmentation for mobile applications.
  • Low-latency image processing in devices with limited computational resources.
  • Integration in AI-driven mobile apps for tasks such as augmented reality (AR) and real-time analytics.

For more detailed use cases and performance comparisons, see the section on Adapting from SAM to MobileSAM.

📅 Created 1 year ago ✏️ Updated 18 days ago

Comments