모바일 세그먼트 애니씽 (MobileSAM)
MobileSAM is a compact, efficient image segmentation model purpose-built for mobile and edge devices. Designed to bring the power of Meta's Segment Anything Model (SAM) to environments with limited compute, MobileSAM delivers near-instant segmentation while maintaining compatibility with the original SAM pipeline. Whether you're developing real-time applications or lightweight deployments, MobileSAM provides impressive segmentation results with a fraction of the size and speed requirements of its predecessors.
Watch: Ultralytics | 단계별 가이드 🎉를 사용하여 추론을 실행하는 방법 MobileSAM
MobileSAM has been adopted in a variety of projects, including Grounding-SAM, AnyLabeling, and Segment Anything in 3D.
MobileSAM was trained on a single GPU using a 100k image dataset (1% of the original images) in less than a day. The training code will be released in the future.
사용 가능한 모델, 지원되는 작업 및 작동 모드
The table below outlines the available MobileSAM model, its pre-trained weights, supported tasks, and compatibility with different operating modes such as Inference, Validation, Training, and Export. Supported modes are indicated by ✅ and unsupported modes by ❌.
모델 유형 | 사전 학습된 가중치 | 지원되는 작업 | 추론 | 유효성 검사 | 교육 | 내보내기 |
---|---|---|---|---|---|---|
MobileSAM | mobile_sam.pt | 인스턴스 세분화 | ✅ | ❌ | ❌ | ❌ |
MobileSAM YOLO 비교
The following comparison highlights the differences between Meta's SAM variants, MobileSAM, and Ultralytics' smallest segmentation models, including YOLO11n-seg:
모델 | 크기 (MB) |
매개변수 (M) |
속도 (CPU) (ms/im) |
---|---|---|---|
메타 SAM-b | 375 | 93.7 | 49401 |
메타 SAM2-b | 162 | 80.8 | 31901 |
메타 SAM2-t | 78.1 | 38.9 | 25997 |
MobileSAM | 40.7 | 10.1 | 25381 |
YOLOv8 백본이 포함된 FastSAM | 23.7 | 11.8 | 55.9 |
Ultralytics YOLOv8n | 6.7 (11.7배 작아짐) | 3.4 (11.4배 감소) | 24.5 (1061배 빨라짐) |
Ultralytics YOLO11n-seg | 5.9 (13.2배 작아짐) | 2.9 (13.4배 감소) | 30.1 (864배 빨라짐) |
This comparison demonstrates the substantial differences in model size and speed between SAM variants and YOLO segmentation models. While SAM models offer unique automatic segmentation capabilities, YOLO models—especially YOLOv8n-seg and YOLO11n-seg—are significantly smaller, faster, and more computationally efficient.
Tests were conducted on a 2025 Apple M4 Pro with 24GB RAM using torch==2.6.0
그리고 ultralytics==8.3.90
. To reproduce these results:
예
from ultralytics import ASSETS, SAM, YOLO, FastSAM
# Profile SAM2-t, SAM2-b, SAM-b, MobileSAM
for file in ["sam_b.pt", "sam2_b.pt", "sam2_t.pt", "mobile_sam.pt"]:
model = SAM(file)
model.info()
model(ASSETS)
# Profile FastSAM-s
model = FastSAM("FastSAM-s.pt")
model.info()
model(ASSETS)
# Profile YOLO models
for file_name in ["yolov8n-seg.pt", "yolo11n-seg.pt"]:
model = YOLO(file_name)
model.info()
model(ASSETS)
SAM 에서 MobileSAM
MobileSAM retains the same pipeline as the original SAM, including pre-processing, post-processing, and all interfaces. This means you can transition from SAM to MobileSAM with minimal changes to your workflow.
The key difference is the image encoder: MobileSAM replaces the original ViT-H encoder (632M parameters) with a much smaller Tiny-ViT encoder (5M parameters). On a single GPU, MobileSAM processes an image in about 12ms (8ms for the encoder, 4ms for the mask decoder).
ViT-Based Image Encoder Comparison
이미지 인코더 | 원본 SAM | MobileSAM |
---|---|---|
매개변수 | 611M | 5M |
속도 | 452ms | 8ms |
Prompt-Guided Mask Decoder
마스크 디코더 | 원본 SAM | MobileSAM |
---|---|---|
매개변수 | 3.876M | 3.876M |
속도 | 4ms | 4ms |
Whole Pipeline Comparison
전체 파이프라인(Enc+Dec) | 원본 SAM | MobileSAM |
---|---|---|
매개변수 | 615M | 9.66M |
속도 | 456ms | 12ms |
The performance of MobileSAM and the original SAM is illustrated below using both point and box prompts.
MobileSAM is approximately 5 times smaller and 7 times faster than FastSAM. For further details, visit the MobileSAM project page.
테스트 MobileSAM Ultralytics
Just like the original SAM, Ultralytics provides a simple interface for testing MobileSAM, supporting both Point and Box prompts.
모델 다운로드
Download the MobileSAM pretrained weights from Ultralytics assets.
포인트 프롬프트
예
from ultralytics import SAM
# Load the model
model = SAM("mobile_sam.pt")
# Predict a segment based on a single point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
# Predict multiple segments based on multiple points prompt
model.predict("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])
# Predict a segment based on multiple points prompt per object
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
# Predict a segment using both positive and negative prompts.
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
상자 프롬프트
예
from ultralytics import SAM
# Load the model
model = SAM("mobile_sam.pt")
# Predict a segment based on a single point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
# Predict multiple segments based on multiple points prompt
model.predict("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])
# Predict a segment based on multiple points prompt per object
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
# Predict a segment using both positive and negative prompts.
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
Both MobileSAM
그리고 SAM
share the same API. For more usage details, see the SAM documentation.
Automatically Build Segmentation Datasets Using a Detection Model
To automatically annotate your dataset with the Ultralytics framework, use the auto_annotate
함수를 호출합니다:
예
인수 | 유형 | 기본값 | 설명 |
---|---|---|---|
data |
str |
필수 | 주석 또는 세분화를 위한 대상 이미지가 포함된 디렉터리 경로입니다. |
det_model |
str |
'yolo11x.pt' |
YOLO 초기 오브젝트 감지를 위한 감지 모델 경로를 설정합니다. |
sam_model |
str |
'sam_b.pt' |
세분화를 위한 SAM 모델 경로( SAM, SAM2 변형 및 mobile_sam 모델 지원). |
device |
str |
'' |
계산 장치(예: 'cuda:0', 'cpu' 또는 ''(자동 장치 감지용)). |
conf |
float |
0.25 |
YOLO 약한 탐지를 필터링하기 위한 탐지 신뢰도 임계값입니다. |
iou |
float |
0.45 |
겹치는 상자를 필터링하는 비 최대 억제에 대한 IoU 임계값입니다. |
imgsz |
int |
640 |
이미지 크기 조정을 위한 입력 크기(32의 배수여야 함). |
max_det |
int |
300 |
메모리 효율성을 위해 이미지당 최대 감지 횟수를 제한합니다. |
classes |
list[int] |
None |
감지할 클래스 인덱스 목록(예, [0, 1] 사람 및 자전거용). |
output_dir |
str |
None |
주석을 저장할 디렉터리(기본값은 데이터 경로를 기준으로 './레이블'입니다). |
인용 및 감사
If MobileSAM is helpful in your research or development, please consider citing the following paper:
Read the full MobileSAM paper on arXiv.
자주 묻는 질문
What Is MobileSAM and How Does It Differ from the Original SAM Model?
MobileSAM is a lightweight, fast image segmentation model optimized for mobile and edge applications. It maintains the same pipeline as the original SAM but replaces the large ViT-H encoder (632M parameters) with a compact Tiny-ViT encoder (5M parameters). This results in MobileSAM being about 5 times smaller and 7 times faster than the original SAM, operating at roughly 12ms per image versus SAM's 456ms. Explore more about MobileSAM's implementation on the MobileSAM GitHub repository.
How Can I Test MobileSAM Using Ultralytics?
Testing MobileSAM in Ultralytics is straightforward. You can use Point and Box prompts to predict segments. For example, using a Point prompt:
from ultralytics import SAM
# Load the model
model = SAM("mobile_sam.pt")
# Predict a segment based on a point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
For more details, see the Testing MobileSAM in Ultralytics section.
Why Should I Use MobileSAM for My Mobile Application?
MobileSAM is ideal for mobile and edge applications due to its lightweight design and rapid inference speed. Compared to the original SAM, MobileSAM is about 5 times smaller and 7 times faster, making it suitable for real-time segmentation on devices with limited computational resources. Its efficiency enables mobile devices to perform real-time image segmentation without significant latency. Additionally, MobileSAM supports Inference mode optimized for mobile performance.
How Was MobileSAM Trained, and Is the Training Code Available?
MobileSAM was trained on a single GPU with a 100k image dataset (1% of the original images) in under a day. While the training code will be released in the future, you can currently access pre-trained weights and implementation details from the MobileSAM GitHub repository.
What Are the Primary Use Cases for MobileSAM?
MobileSAM is designed for fast, efficient image segmentation in mobile and edge environments. Primary use cases include:
- Real-time object detection and segmentation for mobile apps
- Low-latency image processing on devices with limited compute
- Integration in AI-powered mobile applications for augmented reality (AR), analytics, and more
For more details on use cases and performance, see Adapting from SAM to MobileSAM and the Ultralytics blog on MobileSAM applications.