é«éã»ã°ã¡ã³ãäœã§ãã¢ãã« (FastSAM)
Fast Segment Anything Model (FastSAM) ã¯ãSegment Anything ã¿ã¹ã¯ã®ããã®ãæ°ãããªã¢ã«ã¿ã€ã CNN ããŒã¹ã®ãœãªã¥ãŒã·ã§ã³ã§ããããã®ã¿ã¹ã¯ã¯ãæ§ã ãªå¯èœæ§ã®ãããŠãŒã¶ãŒå¯Ÿè©±ããã³ããã«åºã¥ããç»åå ã®ä»»æã®ãªããžã§ã¯ããã»ã°ã¡ã³ãåããããã«èšèšãããŠãããFastSAM ã¯ã競äºåã®ããæ§èœãç¶æããªããèšç®è² è·ãå€§å¹ ã«åæžããæ§ã ãªèŠèŠã¿ã¹ã¯ã®ããã®å®çšçãªéžæè¢ãšãªã£ãŠããã
èŠããã ïŒ FastSAM ã䜿ã£ãç©äœè¿œè·¡Ultralytics
ã¢ãã«å»ºç¯
æŠèŠ
FastSAM ã¯ãSegment Anything Model (SAM) ã®éçã«å¯ŸåŠããããã«èšèšãããŠãããSegment Anything Model ã¯ãèšç®ãªãœãŒã¹ã倧éã«å¿ èŠãšããéãTransformerã¢ãã«ã§ãããFastSAM ã¯ãsegment anything ã¿ã¹ã¯ããå šã€ã³ã¹ã¿ã³ã¹ã®ã»ã°ã¡ã³ããŒã·ã§ã³ãšããã³ããã¬ã€ãä»ãéžæãšãã2ã€ã®é£ç¶ãã段éã«åãé¢ããæåã®ã¹ããŒãžã§ã¯ãYOLOv8-segã䜿çšããŠãç»åå ã®ãã¹ãŠã®ã€ã³ã¹ã¿ã³ã¹ã®ã»ã°ã¡ã³ããŒã·ã§ã³ãã¹ã¯ãçæããã第2段éã§ã¯ãããã³ããã«å¯Ÿå¿ããé¢å¿é åãåºåããã
äž»ãªç¹åŸŽ
-
ãªã¢ã«ã¿ã€ã ã»ãœãªã¥ãŒã·ã§ã³ïŒCNNã®èšç®å¹çã掻çšããããšã§ãFastSAM ãã»ã°ã¡ã³ãã»ãšãã·ã³ã°ã»ã¿ã¹ã¯ã«ãªã¢ã«ã¿ã€ã ã»ãœãªã¥ãŒã·ã§ã³ãæäŸããè¿ éãªçµæãå¿ èŠãšããç£æ¥çšã¢ããªã±ãŒã·ã§ã³ã«å©çšäŸ¡å€ãããã
-
å¹çãšæ§èœïŒ FastSAM ã¯ãæ§èœã®è³ªãèœãšãããšãªããèšç®éãšãªãœãŒã¹ã®éèŠãå€§å¹ ã«åæžããŸããSAM ã«å¹æµããæ§èœãéæããªãããèšç®ãªãœãŒã¹ãå€§å¹ ã«åæžãããªã¢ã«ã¿ã€ã ã®ã¢ããªã±ãŒã·ã§ã³ãå¯èœã«ããŸãã
-
ããã³ããã¬ã€ãã«ããã»ã°ã¡ã³ããŒã·ã§ã³: FastSAM ã¯ãããŸããŸãªãŠãŒã¶ãŒã€ã³ã¿ã©ã¯ã·ã§ã³ããã³ããã«ãã£ãŠèªå°ãããç»åå ã®ãããããªããžã§ã¯ããã»ã°ã¡ã³ããŒã·ã§ã³ããããšãã§ããããŸããŸãªã·ããªãªã«ãããŠæè»æ§ãšé©å¿æ§ãæäŸããã
-
YOLOv8-segã«åºã¥ã: FastSAM ã¯ãYOLOv8-segã«åºã¥ããã®ã§ãã€ã³ã¹ã¿ã³ã¹åå²ãã©ã³ããåãããªããžã§ã¯ãæ€åºåšã§ããããã«ãããç»åå ã®ãã¹ãŠã®ã€ã³ã¹ã¿ã³ã¹ã®ã»ã°ã¡ã³ããŒã·ã§ã³ãã¹ã¯ãå¹æçã«çæã§ããã
-
ãã³ãããŒã¯ã§ã®ç«¶äºçµæïŒMS COCOã®ãªããžã§ã¯ãææ¡ã¿ã¹ã¯ã«ãããŠãFastSAM ã¯ã RTX 3090ã1å°äœ¿çšããå Žåãããå€§å¹ ã«éãé床ã§é«ãã¹ã³ã¢ãéæããŸããã SAMNVIDIA RTX 3090ã䜿çšããå Žåãããå€§å¹ ã«é«éã§é«ã¹ã³ã¢ãéæãããã®å¹çæ§ãšèœåãå®èšŒããŠããŸãã
-
å®çšçãªã¢ããªã±ãŒã·ã§ã³ææ¡ãããã¢ãããŒãã¯ãå€ãã®èŠèŠã¿ã¹ã¯ã«å¯ŸããŠãçŸåšã®ææ³ã®æ°ååããæ°çŸåãšããé«éã§ãæ°ããå®çšçãªãœãªã¥ãŒã·ã§ã³ãæäŸããã
-
ã¢ãã«å§çž®ã®å®çŸå¯èœæ§ïŒ FastSAM ã¯ã人工çãªå è¡æ§é ãå°å ¥ããããšã§èšç®éãå€§å¹ ã«åæžã§ãããã¹ã®å®çŸå¯èœæ§ã瀺ããäžè¬çãªèŠèŠã¿ã¹ã¯ã®ããã®å€§èŠæš¡ã¢ãã«ã¢ãŒããã¯ãã£ã®æ°ããªå¯èœæ§ãéãã
å©çšå¯èœãªã¢ãã«ããµããŒããããã¿ã¹ã¯ãããã³åäœã¢ãŒã
ãã®è¡šã¯ãå©çšå¯èœãªã¢ãã«ããç¹å®ã®äºåèšç·Žãããéã¿ããµããŒãããã¿ã¹ã¯ãããã³æšè«ãæ€èšŒããã¬ãŒãã³ã°ããšã¯ã¹ããŒããªã©ã®ããŸããŸãªæäœã¢ãŒããšã®äºææ§ã瀺ããŸãã
ã¢ãã«ã¿ã€ã | äºåã«èšç·ŽããããŠã§ã€ã | 察å¿ã¿ã¹ã¯ | æšè« | ããªããŒã·ã§ã³ | ãã¬ãŒãã³ã° | èŒžåº |
---|---|---|---|---|---|---|
FastSAM-s | FastSAM-S.PT | ã€ã³ã¹ã¿ã³ã¹ã®ã»ã°ã¡ã³ããŒã·ã§ã³ | â | â | â | â |
FastSAM-x | FastSAM-x.pt | ã€ã³ã¹ã¿ã³ã¹ã®ã»ã°ã¡ã³ããŒã·ã§ã³ | â | â | â | â |
䜿çšäŸ
FastSAM ã¢ãã«ã¯ãPython ã¢ããªã±ãŒã·ã§ã³ã«ç°¡åã«çµ±åã§ããŸããUltralytics ãŠãŒã¶ãŒãã¬ã³ããªãŒãªPython API ãšCLI ã³ãã³ããæäŸããéçºãå¹çåããŸãã
å©çšç¶æ³ãäºæž¬ãã
ãå®è¡ãã ãªããžã§ã¯ãæ€åº ã䜿çšããŸãã predict
ã¡ãœããã䜿çšããïŒ
äŸ
from ultralytics import FastSAM
# Define an inference source
source = "path/to/bus.jpg"
# Create a FastSAM model
model = FastSAM("FastSAM-s.pt") # or FastSAM-x.pt
# Run inference on an image
everything_results = model(source, device="cpu", retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)
# Run inference with bboxes prompt
results = model(source, bboxes=[439, 437, 524, 709])
# Run inference with points prompt
results = model(source, points=[[200, 200]], labels=[1])
# Run inference with texts prompt
results = model(source, texts="a photo of a dog")
# Run inference with bboxes and points and texts prompt at the same time
results = model(source, bboxes=[439, 437, 524, 709], points=[[200, 200]], labels=[1], texts="a photo of a dog")
ãã®ã¹ããããã¯ãäºåã«èšç·Žãããã¢ãã«ãããŒãããç»åäžã§äºæž¬ãå®è¡ããã·ã³ãã«ãã瀺ããŠããŸãã
FastSAMPredictorã®äŸ
ãã®ããã«ããŠãç»åã«å¯ŸããŠæšè«ãå®è¡ãããã¹ãŠã®ã»ã°ã¡ã³ããåŸãããšãã§ããã results
ãäžåºŠå®è¡ããã°ãæšè«ãè€æ°åå®è¡ããããšãªããããã³ããã®æšè«ãè€æ°åå®è¡ã§ããã
from ultralytics.models.fastsam import FastSAMPredictor
# Create FastSAMPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", model="FastSAM-s.pt", save=False, imgsz=1024)
predictor = FastSAMPredictor(overrides=overrides)
# Segment everything
everything_results = predictor("ultralytics/assets/bus.jpg")
# Prompt inference
bbox_results = predictor.prompt(everything_results, bboxes=[[200, 200, 300, 300]])
point_results = predictor.prompt(everything_results, points=[200, 200])
text_results = predictor.prompt(everything_results, texts="a photo of a dog")
泚
æ»ã£ãŠãããã¹ãŠã® results
äžèšã®äŸã§ã¯ çµæ ãã®ãªããžã§ã¯ãã¯ãäºæž¬ããããã¹ã¯ãšãœãŒã¹ç»åã«ç°¡åã«ã¢ã¯ã»ã¹ããããšãã§ããŸãã
ãã«ã®äœ¿çš
ããŒã¿ã»ããã«å¯Ÿããã¢ãã«ã®æ€èšŒã¯ã以äžã®ããã«è¡ãããšãã§ããïŒ
äŸ
FastSAM ã¯ãåäžã¯ã©ã¹ã®ãªããžã§ã¯ãã®æ€åºãšã»ã°ã¡ã³ããŒã·ã§ã³ã®ã¿ããµããŒãããŠããŸããã€ãŸãããã¹ãŠã®ãªããžã§ã¯ããåãã¯ã©ã¹ãšããŠèªèããã»ã°ã¡ã³ããŒã·ã§ã³ããŸãããããã£ãŠãããŒã¿ã»ãããæºåããéã«ã¯ããã¹ãŠã®ãªããžã§ã¯ãã®ã«ããŽãªãŒIDã0ã«å€æããå¿ èŠããããŸãã
ãã©ãã¯å©çš
ç»åã«å¯ŸããŠãªããžã§ã¯ãã»ãã©ããã³ã°ãè¡ãã«ã¯ track
ã¡ãœããã䜿çšããïŒ
äŸ
FastSAM å ¬åŒäœ¿çšæ³
FastSAM ã¯https://github.com/CASIA-IVA-Lab/FastSAMãªããžããªãããçŽæ¥å ¥æã§ããã以äžã¯ãFastSAM ã䜿çšããããã®å žåçãªæé ã®æŠèŠã§ããïŒ
ã€ã³ã¹ããŒã«
-
FastSAM ãªããžããªãã¯ããŒã³ããïŒ
-
Python 3.9ã§Condaç°å¢ãäœæããã¢ã¯ãã£ããŒãããïŒ
-
ã¯ããŒã³ãããªããžããªã«ç§»åããå¿ èŠãªããã±ãŒãžãã€ã³ã¹ããŒã«ããïŒ
-
CLIPã¢ãã«ãåãä»ããïŒ
䜿çšäŸ
-
ã¢ãã«ã®ãã§ãã¯ãã€ã³ããããŠã³ããŒãããã
-
æšè«ã«ã¯FastSAM ãã³ãã³ãã®äŸ
-
ç»åå ã®ãã¹ãŠãã»ã°ã¡ã³ãåããïŒ
-
ããã¹ãããã³ããã䜿çšããŠç¹å®ã®ãªããžã§ã¯ããã»ã°ã¡ã³ãåããŸãïŒ
-
ããŠã³ãã£ã³ã°ããã¯ã¹å ã§ãªããžã§ã¯ããã»ã°ã¡ã³ãåããïŒããã¯ã¹åº§æšã xywh ãã©ãŒãããã§æå®ããïŒïŒ
-
ç¹å®ã®ãã€ã³ãã«è¿ããªããžã§ã¯ããã»ã°ã¡ã³ãåããŸãïŒ
-
ããã«ãColabã®ãã¢ã HuggingFace ã®ãŠã§ããã¢ã§ãFastSAM ãèŠèŠçã«äœéšããããšãã§ããŸãã
åŒçšãšè¬èŸ
ãªã¢ã«ã¿ã€ã ã®ã€ã³ã¹ã¿ã³ã¹ã»ã°ã¡ã³ããŒã·ã§ã³ã®åéã§ã®å€å€§ãªè²¢ç®ã«å¯ŸããŠãFastSAM ã®èè ã«è¬æãè¡šãããïŒ
ãªãªãžãã«ã®è«æïŒFastSAM ïŒã¯arXivã«æ²èŒãããŠãããèè ãã¯åœŒãã®ç 究ãå ¬éããã³ãŒãããŒã¹ã¯GitHubã§ã¢ã¯ã»ã¹ã§ãããæã ã¯ããã®åéãçºå±ãããããåºãã³ãã¥ããã£ã圌ãã®ç 究ã«ã¢ã¯ã»ã¹ã§ããããã«ãã圌ãã®åªåã«æè¬ããŠããã
ããããã質å
FastSAM ãSAM ãšã®éãã¯ïŒ
FastSAMFast Segment Anything Model ã®ç¥ã§ããªããžã§ã¯ãã®ã»ã°ã¡ã³ããŒã·ã§ã³ã¿ã¹ã¯ã§é«ãããã©ãŒãã³ã¹ãç¶æããªãããèšç®è² è·ã軜æžããããã«èšèšãããããªã¢ã«ã¿ã€ã ã®ç³ã¿èŸŒã¿ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒCNNïŒããŒã¹ã®ãœãªã¥ãŒã·ã§ã³ã§ãããéã Transformer ããŒã¹ã®ã¢ãŒããã¯ãã£ã䜿çšãã Segment Anything Model (SAM) ãšã¯ç°ãªããFastSAM ã¯ãUltralytics YOLOv8 -seg ã掻çšããŠã2 段éã®å¹ççãªã€ã³ã¹ã¿ã³ã¹ã»ã°ã¡ã³ããŒã·ã§ã³ãè¡ããŸãã
FastSAM ãã©ã®ããã«ããŠãªã¢ã«ã¿ã€ã ã®ã»ã°ã¡ã³ããŒã·ã§ã³ã»ããã©ãŒãã³ã¹ãå®çŸããŠããã®ãïŒ
FastSAM ã»ã°ã¡ã³ããŒã·ã§ã³ã¿ã¹ã¯ããYOLOv8-segãšããã³ããã¬ã€ãä»ãéžæã¹ããŒãžãæã€ãªãŒã«ã€ã³ã¹ã¿ã³ã¹ã»ã°ã¡ã³ããŒã·ã§ã³ã«åé¢ããããšã§ããªã¢ã«ã¿ã€ã ã»ã°ã¡ã³ããŒã·ã§ã³ãå®çŸãããCNN ã®èšç®å¹çãå©çšããããšã§ãFastSAM ã¯ã競äºåã®ããæ§èœãç¶æããªãããèšç®éãšãªãœãŒã¹ã®éèŠãå€§å¹ ã«åæžããããã®äºæ®µéã¢ãããŒãã«ãããFastSAM ã¯ãè¿ éãªçµæãå¿ èŠãšããã¢ããªã±ãŒã·ã§ã³ã«é©ãããé«éã§å¹ççãªã»ã°ã¡ã³ããŒã·ã§ã³ãå®çŸããã
FastSAM ã®å®çšçãªçšéã¯ïŒ
FastSAM ã¯ããªã¢ã«ã¿ã€ã ã®ã»ã°ã¡ã³ããŒã·ã§ã³æ§èœãå¿ èŠãšããããŸããŸãªã³ã³ãã¥ãŒã¿ããžã§ã³ã¿ã¹ã¯ã«å®çšçã§ãããçšéã¯ä»¥äžã®éãïŒ
- å質管çãšå質ä¿èšŒã®ããã®ç£æ¥ãªãŒãã¡ãŒã·ã§ã³
- ã»ãã¥ãªãã£ãšç£èŠã®ããã®ãªã¢ã«ã¿ã€ã ãããªåæ
- ç©äœæ€åºãšã»ã°ã¡ã³ããŒã·ã§ã³ã®ããã®èªåŸèµ°è¡è»
- æ£ç¢ºã§è¿ éãªã»ã°ã¡ã³ããŒã·ã§ã³äœæ¥ã®ããã®å»ççšç»ååŠç
ããŸããŸãªãŠãŒã¶ãŒå¯Ÿè©±ããã³ãããåŠçããèœåã«ãã£ãŠãFastSAM ã¯å€æ§ãªã·ããªãªã«é©å¿ããæè»ã«å¯Ÿå¿ã§ããã
Python ã®æšè«ã«FastSAM ã¢ãã«ã䜿ãã«ã¯ïŒ
Python ã®æšè«ã«FastSAM ã䜿ãã«ã¯ã以äžã®äŸã«åŸãã°ããïŒ
from ultralytics import FastSAM
# Define an inference source
source = "path/to/bus.jpg"
# Create a FastSAM model
model = FastSAM("FastSAM-s.pt") # or FastSAM-x.pt
# Run inference on an image
everything_results = model(source, device="cpu", retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)
# Run inference with bboxes prompt
results = model(source, bboxes=[439, 437, 524, 709])
# Run inference with points prompt
results = model(source, points=[[200, 200]], labels=[1])
# Run inference with texts prompt
results = model(source, texts="a photo of a dog")
# Run inference with bboxes and points and texts prompt at the same time
results = model(source, bboxes=[439, 437, 524, 709], points=[[200, 200]], labels=[1], texts="a photo of a dog")
æšè«æ¹æ³ã®è©³çŽ°ã«ã€ããŠã¯ãããã¥ã¢ã«ã®ã䜿çšç¶æ³ã®äºæž¬ãã»ã¯ã·ã§ã³ã確èªããŠãã ããã
FastSAM ãã»ã°ã¡ã³ããŒã·ã§ã³ã»ã¿ã¹ã¯ã«ã¯ã©ã®ãããªçš®é¡ã®ããã³ããããµããŒããããŠããŸããïŒ
FastSAM ã¯ãã»ã°ã¡ã³ããŒã·ã§ã³äœæ¥ãã¬ã€ãããããã®è€æ°ã®ããã³ããã¿ã€ãããµããŒãããŠããïŒ
- ãã¹ãŠã®ããã³ããïŒãã¹ãŠã®å¯èŠãªããžã§ã¯ãã®ã»ã°ã¡ã³ããŒã·ã§ã³ãçæããŸãã
- ããŠã³ãã£ã³ã°ããã¯ã¹ïŒBBoxïŒããã³ããïŒæå®ãããããŠã³ãã£ã³ã°ããã¯ã¹å ã§ãªããžã§ã¯ããã»ã°ã¡ã³ãåããŸãã
- ããã¹ãããã³ããïŒèª¬æããã¹ãã䜿çšããŠã説æã«äžèŽãããªããžã§ã¯ããã»ã°ã¡ã³ãåããŸãã
- ãã€ã³ãããã³ããç¹å®ã®ãŠãŒã¶ãŒå®çŸ©ãã€ã³ãä»è¿ã®ãªããžã§ã¯ããã»ã°ã¡ã³ãåããŸãã
ãã®æè»æ§ã«ãããFastSAM ã¯å¹ åºããŠãŒã¶ãŒã€ã³ã¿ã©ã¯ã·ã§ã³ã·ããªãªã«å¯Ÿå¿ããããšãã§ããããŸããŸãªã¢ããªã±ãŒã·ã§ã³ã§ãã®æçšæ§ãé«ããããšãã§ããŸãããããã®ããã³ããã®äœ¿çšæ¹æ³ã®è©³çŽ°ã«ã€ããŠã¯ããäž»ãªæ©èœãã»ã¯ã·ã§ã³ãåç §ããŠãã ããã