ã¢ãã€ã«ã»ã°ã¡ã³ããªãã§ã (MobileSAM)
MobileSAM ã®è«æã¯çŸåšarXivã«æ²èŒãããŠããã
CPU äžã§åäœããMobileSAM ã®ãã¢ã¯ããã®ãã¢ãªã³ã¯ããã¢ã¯ã»ã¹ã§ãããMac i5CPU ã§ã®ããã©ãŒãã³ã¹ã¯çŽ3ç§ã§ããHugging Face ã®ãã¢ã§ã¯ãã€ã³ã¿ãŒãã§ãŒã¹ãšäœæ§èœã®CPUãã¬ã¹ãã³ã¹ã®é ãã«å¯äžããŠããŸãããå¹æçã«æ©èœãç¶ããŠããŸãã
èŠããã ïŒ Ultralytics ã䜿ã£ãŠMobileSAM ã§æšè«ãå®è¡ããæ¹æ³ïœã¹ãããã»ãã€ã»ã¹ãããã»ã¬ã€ãðã
MobileSAM ã¯ãGrounding-SAM ãAnyLabelingãSegment Anything in 3D ãå«ãæ§ã ãªãããžã§ã¯ãã§å®è£ ãããŠããã
MobileSAM ã¯ã100kã®ããŒã¿ã»ããïŒå ç»åã®1ïŒ ïŒãçšããŠãGPU ã1æ¥è¶³ããã§åŠç¿ãããããã®ãã¬ãŒãã³ã°ã®ã³ãŒãã¯å°æ¥å ¬éãããäºå®ã§ããã
å©çšå¯èœãªã¢ãã«ããµããŒããããã¿ã¹ã¯ãããã³åäœã¢ãŒã
ãã®è¡šã¯ãå©çšå¯èœãªã¢ãã«ããç¹å®ã®äºåèšç·Žãããéã¿ããµããŒãããã¿ã¹ã¯ãããã³æšè«ãæ€èšŒããã¬ãŒãã³ã°ããšã¯ã¹ããŒããªã©ã®ããŸããŸãªæäœã¢ãŒããšã®äºææ§ã瀺ããŸãã
ã¢ãã«ã¿ã€ã | äºåã«èšç·ŽããããŠã§ã€ã | 察å¿ã¿ã¹ã¯ | æšè« | ããªããŒã·ã§ã³ | ãã¬ãŒãã³ã° | èŒžåº |
---|---|---|---|---|---|---|
MobileSAM | ã¢ãã€ã«ãµã | ã€ã³ã¹ã¿ã³ã¹ã®ã»ã°ã¡ã³ããŒã·ã§ã³ | â | â | â | â |
SAM ããMobileSAM
MobileSAM ã¯ãªãªãžãã«ã®SAM ãšåããã€ãã©ã€ã³ãä¿æããŠããããããªãªãžãã«ã®ååŠçãåŸåŠçããã®ä»ãã¹ãŠã®ã€ã³ã¿ãŒãã§ã€ã¹ãçµã¿èŸŒãã§ããããã®çµæãçŸåšãªãªãžãã«ã®SAM ã䜿çšããŠãã人ã¯ãæå°éã®åŽåã§MobileSAM ã«ç§»è¡ããããšãã§ããã
MobileSAM ã¯ãªãªãžãã«ã®SAM ãšåçã®æ§èœãæã¡ãç»åãšã³ã³ãŒãã®å€æŽä»¥å€ã¯åããã€ãã©ã€ã³ãç¶æããŠããŸããå ·äœçã«ã¯ããªãªãžãã«ã®ãããŒçŽã®ViT-Hãšã³ã³ãŒããŒïŒ632MïŒããããå°åã®Tiny-ViTïŒ5MïŒã«çœ®ãæããŠããŸããã·ã³ã°ã«GPU ãMobileSAM ã¯ç»åãããçŽ12msã§åäœããïŒç»åãšã³ã³ãŒããŒã§8msããã¹ã¯ã»ãã³ãŒããŒã§4msã§ããã
次ã®è¡šã¯ãViTããŒã¹ã®ç»åãšã³ã³ãŒãã®æ¯èŒã§ããïŒ
ç»åãšã³ã³ãŒã㌠| ãªãªãžãã«SAM | MobileSAM |
---|---|---|
ãã©ã¡ãŒã¿ | 611M | 5M |
ã¹ããŒã | 452ããªç§ | 8ms |
ãªãªãžãã«ã®SAM ãMobileSAM ãã©ã¡ããåãããã³ããã¬ã€ãä»ããã¹ã¯ãã³ãŒããŒãå©çšããŠããïŒ
ãã¹ã¯ã»ãã³ãŒã㌠| ãªãªãžãã«SAM | MobileSAM |
---|---|---|
ãã©ã¡ãŒã¿ | 3.876M | 3.876M |
ã¹ããŒã | 4ms | 4ms |
ãã€ãã©ã€ã³å šäœã®æ¯èŒã§ããïŒ
ãã€ãã©ã€ã³å šäœïŒEnc+DecïŒ | ãªãªãžãã«SAM | MobileSAM |
---|---|---|
ãã©ã¡ãŒã¿ | 615M | 9.66M |
ã¹ããŒã | 456ms | 12ms |
ç¹ãšç®±ã®äž¡æ¹ãããã³ãããšããŠäœ¿çšããMobileSAM ãšãªãªãžãã«ã®SAM ã®ããã©ãŒãã³ã¹ã瀺ãã
ãã®åªããæ§èœã«ãããMobileSAM ã¯ãçŸè¡ã®FastSAM ãããçŽ5åå°ããã7åé«éã§ããã詳现ã¯MobileSAM ãããžã§ã¯ãã®ããŒãžã§ã芧ããã ããŸãã
ãã¹ãMobileSAM Ultralytics
ãªãªãžãã«ã®SAM ãšåæ§ã«ãUltralytics ã§ã¯ããã€ã³ãã»ããã³ãããšããã¯ã¹ã»ããã³ããã®äž¡æ¹ã®ã¢ãŒããå«ãããããããããã¹ãæ¹æ³ãæäŸããŠããŸãã
ã¢ãã«ããŠã³ããŒã
ã¢ãã«ã¯ãã¡ãããããŠã³ããŒãã§ããŸãã
ãã€ã³ãã»ããã³ãã
äŸ
from ultralytics import SAM
# Load the model
model = SAM("mobile_sam.pt")
# Predict a segment based on a single point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
# Predict multiple segments based on multiple points prompt
model.predict("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])
# Predict a segment based on multiple points prompt per object
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
# Predict a segment using both positive and negative prompts.
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
ããã¯ã¹ã»ããã³ãã
äŸ
from ultralytics import SAM
# Load the model
model = SAM("mobile_sam.pt")
# Predict a segment based on a single point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
# Predict mutiple segments based on multiple points prompt
model.predict("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])
# Predict a segment based on multiple points prompt per object
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
# Predict a segment using both positive and negative prompts.
model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
ãå®æœããã MobileSAM
ãã㊠SAM
åãAPIã䜿çšããŠããŸãã詳ãã䜿çšæ³ã«ã€ããŠã¯ SAM ããŒãž.
Automatically Build Segmentation Datasets Leveraging a Detection Model
To automatically annotate your dataset using the Ultralytics framework, utilize the auto_annotate
function as demonstrated below:
äŸ
è°è« | ã¿ã€ã | ããã©ã«ã | 説æ |
---|---|---|---|
data |
str |
required | Path to directory containing target images/videos for annotation or segmentation. |
det_model |
str |
"yolo11x.pt" |
YOLO detection model path for initial object detection. |
sam_model |
str |
"sam2_b.pt" |
SAM2 model path for segmentation (supports t/s/b/l variants and SAM2.1) and mobile_sam models. |
device |
str |
"" |
Computation device (e.g., 'cuda:0', 'cpu', or '' for automatic device detection). |
conf |
float |
0.25 |
YOLO detection confidence threshold for filtering weak detections. |
iou |
float |
0.45 |
IoU threshold for Non-Maximum Suppression to filter overlapping boxes. |
imgsz |
int |
640 |
Input size for resizing images (must be multiple of 32). |
max_det |
int |
300 |
Maximum number of detections per image for memory efficiency. |
classes |
list[int] |
None |
List of class indices to detect (e.g., [0, 1] for person & bicycle). |
output_dir |
str |
None |
Save directory for annotations (defaults to './labels' relative to data path). |
åŒçšãšè¬èŸ
ãããããªãã®ç 究ãéçºæ¥åã«MobileSAM ãç§ãã¡ã®è«æã®åŒçšããæ€èšãã ããïŒ
ããããã質å
MobileSAM ããªãªãžãã«ã®SAM ã¢ãã«ãšã®éãã¯ïŒ
MobileSAM ã¯ãã¢ãã€ã«ã¢ããªã±ãŒã·ã§ã³åãã«èšèšãããã軜éã§é«éãªç»åã»ã°ã¡ã³ããŒã·ã§ã³ã¢ãã«ã§ããããªãªãžãã«ã®SAM ãšåããã€ãã©ã€ã³ãç¶æ¿ããŠããããééã®å€§ãã ViT-H ãšã³ã³ãŒããŒïŒ632M ãã©ã¡ãŒã¿ãŒïŒããããå°ã㪠Tiny-ViT ãšã³ã³ãŒããŒïŒ5M ãã©ã¡ãŒã¿ãŒïŒã«çœ®ãæããŠããããã®å€æŽã«ãããMobileSAM ã¯ããªãªãžãã«ã®SAM ãããçŽ5åå°ããã7åéããªã£ããäŸãã°ãMobileSAM ã¯ããªãªãžãã«ã®SAM ã® 456ms ãšæ¯èŒããŠãç»åãããçŽ 12ms ã§åäœããŸããæ§ã ãªãããžã§ã¯ãã«ãããMobileSAM ã®å®è£ ã«ã€ããŠã¯ããã¡ããã芧ãã ããã
Ultralytics ã䜿ã£ãŠMobileSAM ããã¹ãããã«ã¯ïŒ
Ultralytics ã®MobileSAM ã®ãã¹ãã¯ãç°¡åãªæ¹æ³ã§è¡ãããšãã§ããŸããPoint ããã³ãããš Box ããã³ããã䜿ã£ãŠã»ã°ã¡ã³ããäºæž¬ããããšãã§ããã以äžã¯ Point ããã³ããã䜿ã£ãäŸã§ãïŒ
from ultralytics import SAM
# Load the model
model = SAM("mobile_sam.pt")
# Predict a segment based on a point prompt
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
詳现ã«ã€ããŠã¯ã MobileSAM ã® ãã¹ãã»ã¯ã·ã§ã³ãåç §ããŠãã ããã
ãªãã¢ãã€ã«ã»ã¢ããªã±ãŒã·ã§ã³ã«MobileSAM ã
MobileSAM ã¯ã軜éãªã¢ãŒããã¯ãã£ãšé«éãªæšè«é床ã«ãããã¢ãã€ã«ã»ã¢ããªã±ãŒã·ã§ã³ã«æé©ã§ããããªãªãžãã«ã®SAM ãšæ¯èŒããŠãMobileSAM ã¯çŽ 5 åå°ããã7 åé«éã§ãããããèšç®ãªãœãŒã¹ãéãããŠããç°å¢ã«é©ããŠããããã®å¹çæ§ã«ãããã¢ãã€ã«ããã€ã¹ã¯å€§ããªåŸ ã¡æéãªãã«ãªã¢ã«ã¿ã€ã ã®ç»åã»ã°ã¡ã³ããŒã·ã§ã³ãå®è¡ã§ãããããã«ãMobileSAM ã®æšè«ãªã©ã®ã¢ãã«ã¯ãã¢ãã€ã«æ§èœã«æé©åãããŠããã
MobileSAM ãã©ã®ããã«ãã¬ãŒãã³ã°ãããã®ã§ããïŒãã¬ãŒãã³ã°ã³ãŒãã¯å ¥æå¯èœã§ããïŒ
MobileSAM ã¯ãå ç»åã®1%ã«çžåœãã100kã®ããŒã¿ã»ããã䜿ã£ãŠãGPU ã1æ¥ãããããã«åŠç¿ããããåŠç¿ã³ãŒãã¯å°æ¥å ¬éãããäºå®ã ããçŸåšãMobileSAM GitHubãªããžããªã§ MobileSAM ã®ä»ã®åŽé¢ã調ã¹ãããšãã§ããããã®ãªããžããªã«ã¯ãäºåã«èšç·Žãããéã¿ãšæ§ã ãªã¢ããªã±ãŒã·ã§ã³ã®å®è£ ã®è©³çŽ°ãå«ãŸããŠããã
MobileSAM ã®äž»ãªäœ¿çšäŸã¯ïŒ
MobileSAM ã¯ãã¢ãã€ã«ç°å¢ã§ã®é«éãã€å¹ççãªç»ååå²ã®ããã«èšèšãããŠãããäž»ãªäœ¿çšäŸ
- ã¢ãã€ã«ã¢ããªã±ãŒã·ã§ã³ã®ããã®ãªã¢ã«ã¿ã€ã ç©äœæ€åºãšã»ã°ã¡ã³ããŒã·ã§ã³ã
- èšç®è³æºãéãããæ©åšã«ãããäœé 延ç»ååŠçã
- æ¡åŒµçŸå®ïŒARïŒããªã¢ã«ã¿ã€ã åæãªã©ã®ã¿ã¹ã¯ã®ããã®AIé§ååã¢ãã€ã«ã¢ããªãžã®çµ±åã
ãã詳现ãªäœ¿çšäŸãšæ§èœæ¯èŒã«ã€ããŠã¯ããSAM ããMobileSAM ãžã®é©å¿ãã®ã»ã¯ã·ã§ã³ãåç §ã®ããšã