์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

๋น ๋ฅธ ์„ธ๊ทธ๋จผํŠธ ๋ฌด์—‡์ด๋“  ๋ชจ๋ธ (FastSAM)

๊ณ ์† ์„ธ๊ทธ๋จผํŠธ ๋ชจ๋ธ(FastSAM)์€ ๋ฌด์—‡์ด๋“  ์„ธ๊ทธ๋จผํŠธ ์ž‘์—…์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์‹ค์‹œ๊ฐ„ CNN ๊ธฐ๋ฐ˜ ์†”๋ฃจ์…˜์ž…๋‹ˆ๋‹ค. ์ด ์ž‘์—…์€ ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ์ž ์ƒํ˜ธ์ž‘์šฉ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€ ๋‚ด์˜ ๋ชจ๋“  ๋ฌผ์ฒด๋ฅผ ๋ถ„ํ• ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. FastSAM ) ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ณ„์‚ฐ ์ˆ˜์š”๋ฅผ ํฌ๊ฒŒ ์ค„์—ฌ ๋‹ค์–‘ํ•œ ๋น„์ „ ์ž‘์—…์— ์‹ค์šฉ์ ์ธ ์„ ํƒ์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.



Watch: FastSAM ๋ฅผ ์‚ฌ์šฉํ•œ ๊ฐœ์ฒด ์ถ”์  Ultralytics

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜

ํŒจ์ŠคํŠธ ์„ธ๊ทธ๋จผํŠธ ์• ๋‹ˆ์”ฝ ๋ชจ๋ธ (FastSAM) ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š”

๊ฐœ์š”

FastSAM ์€ ์ƒ๋‹นํ•œ ๊ณ„์‚ฐ ๋ฆฌ์†Œ์Šค๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” ๋ฌด๊ฑฐ์šด Transformer ๋ชจ๋ธ์ธ ์„ธ๊ทธ๋จผํŠธ ์• ๋‹ˆ๋ต ๋ชจ๋ธ(SAM)์˜ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. FastSAM ์€ ์„ธ๊ทธ๋จผํŠธ ๋ฌด์—‡์ด๋“  ์ž‘์—…์„ ๋‘ ๊ฐœ์˜ ์ˆœ์ฐจ์  ๋‹จ๊ณ„, ์ฆ‰ ์ „์ฒด ์ธ์Šคํ„ด์Šค์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜๊ณผ ํ”„๋กฌํ”„ํŠธ ์•ˆ๋‚ด ์„ ํƒ์œผ๋กœ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ๋Š” YOLOv8-seg๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ์ธ์Šคํ„ด์Šค์— ๋Œ€ํ•œ ์„ธ๊ทธ๋จผํŠธ ๋งˆ์Šคํฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ๋Š” ํ”„๋กฌํ”„ํŠธ์— ํ•ด๋‹นํ•˜๋Š” ๊ด€์‹ฌ ์˜์—ญ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ๊ธฐ๋Šฅ

  1. ์‹ค์‹œ๊ฐ„ ์†”๋ฃจ์…˜: FastSAM ์€ CNN์˜ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ ์„ธ๊ทธ๋จผํŠธ ์• ๋‹ˆ์›จ์–ด ์ž‘์—…์„ ์œ„ํ•œ ์‹ค์‹œ๊ฐ„ ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•˜๋ฏ€๋กœ ๋น ๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” ์‚ฐ์—… ๋ถ„์•ผ์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

  2. ํšจ์œจ์„ฑ ๋ฐ ์„ฑ๋Šฅ: FastSAM ์„ฑ๋Šฅ ํ’ˆ์งˆ์€ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ ์ปดํ“จํŒ… ๋ฐ ๋ฆฌ์†Œ์Šค ์š”๊ตฌ๋Ÿ‰์„ ํฌ๊ฒŒ ์ค„์˜€์Šต๋‹ˆ๋‹ค. SAM ์™€ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜์ง€๋งŒ ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค๋ฅผ ๋Œ€ํญ ์ค„์—ฌ ์‹ค์‹œ๊ฐ„ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  3. ํ”„๋กฌํ”„ํŠธ ์•ˆ๋‚ด ์„ธ๊ทธ๋จผํŠธ: FastSAM ๋Š” ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ์ž ์ƒํ˜ธ์ž‘์šฉ ํ”„๋กฌํ”„ํŠธ์— ๋”ฐ๋ผ ์ด๋ฏธ์ง€ ๋‚ด์˜ ๋ชจ๋“  ๊ฐœ์ฒด๋ฅผ ์„ธ๊ทธ๋จผํŠธํ™”ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์œ ์—ฐ์„ฑ๊ณผ ์ ์‘์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

  4. YOLOv8 -seg ๊ธฐ๋ฐ˜: FastSAM ์€ ์ธ์Šคํ„ด์Šค ๋ถ„ํ•  ๋ถ„๊ธฐ๊ฐ€ ์žฅ์ฐฉ๋œ ๊ฐ์ฒด ๊ฒ€์ถœ๊ธฐ์ธ YOLOv8-seg๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ์ธ์Šคํ„ด์Šค์— ๋Œ€ํ•œ ๋ถ„ํ•  ๋งˆ์Šคํฌ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  5. ๋ฒค์น˜๋งˆํฌ์—์„œ์˜ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ๊ฒฐ๊ณผ: MS COCO์˜ ๊ฐ์ฒด ์ œ์•ˆ ์ž‘์—…์—์„œ FastSAM ์€ ๋‹จ์ผ RTX 3090๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅธ ์†๋„๋กœ ๋†’์€ ์ ์ˆ˜๋ฅผ ํš๋“ํ–ˆ์Šต๋‹ˆ๋‹ค. SAM ๋‹จ์ผ NVIDIA RTX 3090๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅธ ์†๋„๋กœ ๋†’์€ ์ ์ˆ˜๋ฅผ ํš๋“ํ•˜์—ฌ ํšจ์œจ์„ฑ๊ณผ ์„ฑ๋Šฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

  6. ์‹ค์šฉ์ ์ธ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜: ์ œ์•ˆ๋œ ์ ‘๊ทผ ๋ฐฉ์‹์€ ํ˜„์žฌ ๋ฐฉ๋ฒ•๋ณด๋‹ค ์ˆ˜์‹ญ, ์ˆ˜๋ฐฑ ๋ฐฐ ๋น ๋ฅธ ์†๋„๋กœ ์ˆ˜๋งŽ์€ ๋น„์ „ ์ž‘์—…์„ ์œ„ํ•œ ์ƒˆ๋กญ๊ณ  ์‹ค์šฉ์ ์ธ ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

  7. ๋ชจ๋ธ ์••์ถ• ๊ฐ€๋Šฅ์„ฑ: FastSAM ์€ ๊ตฌ์กฐ์— ์ธ๊ณต์ ์ธ ์„ ํ–‰ ์š”์†Œ๋ฅผ ๋„์ž…ํ•˜์—ฌ ๊ณ„์‚ฐ ๋…ธ๋ ฅ์„ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๊ฒฝ๋กœ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์คŒ์œผ๋กœ์จ ์ผ๋ฐ˜์ ์ธ ๋น„์ „ ์ž‘์—…์„ ์œ„ํ•œ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์˜ ์ƒˆ๋กœ์šด ๊ฐ€๋Šฅ์„ฑ์„ ์—ด์–ด์ค๋‹ˆ๋‹ค.

์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ, ์ง€์›๋˜๋Š” ์ž‘์—… ๋ฐ ์ž‘๋™ ๋ชจ๋“œ

์ด ํ‘œ์—๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ํŠน์ • ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜, ์ง€์›๋˜๋Š” ์ž‘์—…, ์ถ”๋ก , ๊ฒ€์ฆ, ํ•™์Šต ๋ฐ ๋‚ด๋ณด๋‚ด๊ธฐ์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์ž‘๋™ ๋ชจ๋“œ์™€์˜ ํ˜ธํ™˜์„ฑ์ด ํ‘œ์‹œ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ง€์›๋˜๋Š” ๋ชจ๋“œ์˜ ๊ฒฝ์šฐ โœ… ์ด๋ชจํ‹ฐ์ฝ˜, ์ง€์›๋˜์ง€ ์•Š๋Š” ๋ชจ๋“œ์˜ ๊ฒฝ์šฐ โŒ ์ด๋ชจํ‹ฐ์ฝ˜์œผ๋กœ ํ‘œ์‹œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ์œ ํ˜• ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜ ์ง€์›๋˜๋Š” ์ž‘์—… ์ถ”๋ก  ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๊ต์œก ๋‚ด๋ณด๋‚ด๊ธฐ
FastSAM-s FastSAM-s.pt ์ธ์Šคํ„ด์Šค ์„ธ๋ถ„ํ™” โœ… โŒ โŒ โœ…
FastSAM-x FastSAM-x.pt ์ธ์Šคํ„ด์Šค ์„ธ๋ถ„ํ™” โœ… โŒ โŒ โœ…

์‚ฌ์šฉ ์˜ˆ

FastSAM ๋ชจ๋ธ์€ Python ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์‰ฝ๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Ultralytics ์‚ฌ์šฉ์ž ์นœํ™”์ ์ธ Python API์™€ CLI ๋ช…๋ น์–ด๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๊ฐœ๋ฐœ์„ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋Ÿ‰ ์˜ˆ์ธก

์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ๋ฌผ์ฒด ๊ฐ์ง€ ์ด๋ฏธ์ง€์—์„œ predict ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

์˜ˆ

from ultralytics import FastSAM

# Define an inference source
source = "path/to/bus.jpg"

# Create a FastSAM model
model = FastSAM("FastSAM-s.pt")  # or FastSAM-x.pt

# Run inference on an image
everything_results = model(source, device="cpu", retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)

# Run inference with bboxes prompt
results = model(source, bboxes=[439, 437, 524, 709])

# Run inference with points prompt
results = model(source, points=[[200, 200]], labels=[1])

# Run inference with texts prompt
results = model(source, texts="a photo of a dog")

# Run inference with bboxes and points and texts prompt at the same time
results = model(source, bboxes=[439, 437, 524, 709], points=[[200, 200]], labels=[1], texts="a photo of a dog")
# Load a FastSAM model and segment everything with it
yolo segment predict model=FastSAM-s.pt source=path/to/bus.jpg imgsz=640

์ด ์Šค๋‹ˆํŽซ์€ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ์˜ˆ์ธก์„ ์‹คํ–‰ํ•˜๋Š” ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

FastSAMPredictor ์˜ˆ์ œ

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์ด๋ฏธ์ง€์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•˜๊ณ  ๋ชจ๋“  ์„ธ๊ทธ๋จผํŠธ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. results ๋ฅผ ํ•œ ๋ฒˆ๋งŒ ์‹คํ–‰ํ•˜๊ณ  ํ”„๋กฌํ”„ํŠธ ์ถ”๋ก ์„ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•˜์ง€ ์•Š๊ณ  ์ถ”๋ก ์„ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

from ultralytics.models.fastsam import FastSAMPredictor

# Create FastSAMPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", model="FastSAM-s.pt", save=False, imgsz=1024)
predictor = FastSAMPredictor(overrides=overrides)

# Segment everything
everything_results = predictor("ultralytics/assets/bus.jpg")

# Prompt inference
bbox_results = predictor.prompt(everything_results, bboxes=[[200, 200, 300, 300]])
point_results = predictor.prompt(everything_results, points=[200, 200])
text_results = predictor.prompt(everything_results, texts="a photo of a dog")

์ฐธ๊ณ 

๋ฐ˜ํ™˜๋œ ๋ชจ๋“  results ์œ„์˜ ์˜ˆ์—์„œ ๊ฒฐ๊ณผ ๊ฐ์ฒด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธก๋œ ๋งˆ์Šคํฌ์™€ ์†Œ์Šค ์ด๋ฏธ์ง€์— ์‰ฝ๊ฒŒ ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Val ์‚ฌ์šฉ๋Ÿ‰

๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ๋ชจ๋ธ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

์˜ˆ

from ultralytics import FastSAM

# Create a FastSAM model
model = FastSAM("FastSAM-s.pt")  # or FastSAM-x.pt

# Validate the model
results = model.val(data="coco8-seg.yaml")
# Load a FastSAM model and validate it on the COCO8 example dataset at image size 640
yolo segment val model=FastSAM-s.pt data=coco8.yaml imgsz=640

FastSAM ์€ ๋‹จ์ผ ํด๋ž˜์Šค ์˜ค๋ธŒ์ ํŠธ์˜ ๊ฐ์ง€ ๋ฐ ์„ธ๋ถ„ํ™”๋งŒ ์ง€์›ํ•œ๋‹ค๋Š” ์ ์— ์œ ์˜ํ•˜์„ธ์š”. ์ฆ‰, ๋ชจ๋“  ๊ฐ์ฒด๋ฅผ ๋™์ผํ•œ ํด๋ž˜์Šค๋กœ ์ธ์‹ํ•˜๊ณ  ์„ธ๋ถ„ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ค€๋น„ํ•  ๋•Œ ๋ชจ๋“  ๊ฐ์ฒด ์นดํ…Œ๊ณ ๋ฆฌ ID๋ฅผ 0์œผ๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋Ÿ‰ ์ถ”์ 

์ด๋ฏธ์ง€์—์„œ ๊ฐœ์ฒด ์ถ”์ ์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด track ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

์˜ˆ

from ultralytics import FastSAM

# Create a FastSAM model
model = FastSAM("FastSAM-s.pt")  # or FastSAM-x.pt

# Track with a FastSAM model on a video
results = model.track(source="path/to/video.mp4", imgsz=640)
yolo segment track model=FastSAM-s.pt source="path/to/video/file.mp4" imgsz=640

FastSAM ๊ณต์‹ ์‚ฌ์šฉ๋ฒ•

FastSAM https://github.com/CASIA-IVA-Lab/ FastSAM ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ ์ง์ ‘ ๋‹ค์šด๋กœ๋“œํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ FastSAM ์„ ์‚ฌ์šฉํ•˜๋Š” ์ผ๋ฐ˜์ ์ธ ๋‹จ๊ณ„์— ๋Œ€ํ•œ ๊ฐ„๋žตํ•œ ๊ฐœ์š”์ž…๋‹ˆ๋‹ค:

์„ค์น˜

  1. FastSAM ๋ฆฌํฌ์ง€ํ† ๋ฆฌ๋ฅผ ๋ณต์ œํ•ฉ๋‹ˆ๋‹ค:

    git clone https://github.com/CASIA-IVA-Lab/FastSAM.git
    
  2. Python 3.9๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Conda ํ™˜๊ฒฝ์„ ๋งŒ๋“ค๊ณ  ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค:

    conda create -n FastSAM python=3.9
    conda activate FastSAM
    
  3. ๋ณต์ œ๋œ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ๋กœ ์ด๋™ํ•˜์—ฌ ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค:

    cd FastSAM
    pip install -r requirements.txt
    
  4. CLIP ๋ชจ๋ธ์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค:

    pip install git+https://github.com/ultralytics/CLIP.git
    

์‚ฌ์šฉ ์˜ˆ

  1. ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์„ธ์š”.

  2. ์ถ”๋ก ์—๋Š” FastSAM ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ช…๋ น ์˜ˆ์‹œ:

    • ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ํ•ญ๋ชฉ์„ ์„ธ๋ถ„ํ™”ํ•˜์„ธ์š”:

      python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg
      
    • ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ๊ฐœ์ฒด๋ฅผ ์„ธ๊ทธ๋จผํŠธํ™”ํ•ฉ๋‹ˆ๋‹ค:

      python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --text_prompt "the yellow dog"
      
    • ๊ฒฝ๊ณ„ ์ƒ์ž ๋‚ด์—์„œ ๊ฐœ์ฒด๋ฅผ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค(์ƒ์ž ์ขŒํ‘œ๋ฅผ xywh ํ˜•์‹์œผ๋กœ ์ œ๊ณต):

      python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --box_prompt "[570,200,230,400]"
      
    • ํŠน์ • ์ง€์  ๊ทผ์ฒ˜์˜ ์˜ค๋ธŒ์ ํŠธ๋ฅผ ์„ธ๊ทธ๋จผํŠธํ™”ํ•ฉ๋‹ˆ๋‹ค:

      python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --point_prompt "[[520,360],[620,300]]" --point_label "[1,0]"
      

๋˜ํ•œ Colab ๋ฐ๋ชจ ๋˜๋Š” HuggingFace ์›น ๋ฐ๋ชจ๋ฅผ ํ†ตํ•ด FastSAM ์—์„œ ์‹œ๊ฐ์  ๊ฒฝํ—˜์„ ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ธ์šฉ ๋ฐ ๊ฐ์‚ฌ

์‹ค์‹œ๊ฐ„ ์ธ์Šคํ„ด์Šค ์„ธ๋ถ„ํ™” ๋ถ„์•ผ์—์„œ ํฌ๊ฒŒ ๊ธฐ์—ฌํ•œ FastSAM ์ž‘์„ฑ์ž์—๊ฒŒ ๊ฐ์‚ฌ์˜ ๋ง์”€์„ ์ „ํ•ฉ๋‹ˆ๋‹ค:

@misc{zhao2023fast,
      title={Fast Segment Anything},
      author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
      year={2023},
      eprint={2306.12156},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

์›๋ณธ FastSAM ๋…ผ๋ฌธ์€ arXiv์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ์ž์‹ ์˜ ์ž‘์—…์„ ๊ณต๊ฐœํ–ˆ์œผ๋ฉฐ, ์ฝ”๋“œ๋ฒ ์ด์Šค๋Š” GitHub์—์„œ ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ถ„์•ผ๋ฅผ ๋ฐœ์ „์‹œํ‚ค๊ณ  ๋” ๋งŽ์€ ์ปค๋ฎค๋‹ˆํ‹ฐ๊ฐ€ ์ž์‹ ์˜ ์—ฐ๊ตฌ์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ ์ €์ž๋“ค์˜ ๋…ธ๋ ฅ์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

FastSAM ์ด๋ž€ ๋ฌด์—‡์ด๋ฉฐ SAM ๊ณผ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅธ๊ฐ€์š”?

FastSAM์˜ ์ค„์ž„๋ง์ธ ๊ณ ์† ์„ธ๊ทธ๋จผํŠธ ์• ๋‹ˆ์”ฝ ๋ชจ๋ธ์€ ๊ฐ์ฒด ๋ถ„ํ•  ์ž‘์—…์—์„œ ๋†’์€ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ณ„์‚ฐ ์ˆ˜์š”๋ฅผ ์ค„์ด๋„๋ก ์„ค๊ณ„๋œ ์‹ค์‹œ๊ฐ„ ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง (CNN) ๊ธฐ๋ฐ˜ ์†”๋ฃจ์…˜์ž…๋‹ˆ๋‹ค. ๋” ๋ฌด๊ฑฐ์šด Transformer ๊ธฐ๋ฐ˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์„ธ๊ทธ๋จผํŠธ ์• ๋‹ˆ์”ฝ ๋ชจ๋ธ(SAM)๊ณผ ๋‹ฌ๋ฆฌ, FastSAM ์€ Ultralytics YOLOv8 -seg๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ „์ฒด ์ธ์Šคํ„ด์Šค ์„ธ๊ทธ๋จผํŠธ์™€ ํ”„๋กฌํ”„ํŠธ ๊ฐ€์ด๋“œ ์„ ํƒ์˜ ๋‘ ๋‹จ๊ณ„๋กœ ํšจ์œจ์ ์ธ ์ธ์Šคํ„ด์Šค ์„ธ๊ทธ๋จผํ…Œ์ด์…˜์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

FastSAM ์–ด๋–ป๊ฒŒ ์‹ค์‹œ๊ฐ„ ์„ธ๋ถ„ํ™” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋‚˜์š”?

FastSAM ๋Š” ์„ธ๋ถ„ํ™” ์ž‘์—…์„ YOLOv8-seg ๋ฐ ํ”„๋กฌํ”„ํŠธ ๊ฐ€์ด๋“œ ์„ ํƒ ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด ๋ชจ๋“  ์ธ์Šคํ„ด์Šค ์„ธ๋ถ„ํ™”๋กœ ๋ถ„๋ฆฌํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ์„ธ๋ถ„ํ™”๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. FastSAM ์€ CNN์˜ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ณ„์‚ฐ ๋ฐ ๋ฆฌ์†Œ์Šค ์ˆ˜์š”๋ฅผ ํฌ๊ฒŒ ์ค„์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ด์ค‘ ๋‹จ๊ณ„ ์ ‘๊ทผ ๋ฐฉ์‹์„ ํ†ตํ•ด FastSAM ์€ ๋น ๋ฅธ ๊ฒฐ๊ณผ๊ฐ€ ํ•„์š”ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ํ•ฉํ•œ ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์ธ ์„ธ๋ถ„ํ™”๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

FastSAM ์˜ ์‹ค์ œ ์ ์šฉ ๋ถ„์•ผ๋Š” ๋ฌด์—‡์ธ๊ฐ€์š”?

FastSAM ๋Š” ์‹ค์‹œ๊ฐ„ ์„ธ๋ถ„ํ™” ์„ฑ๋Šฅ์ด ํ•„์š”ํ•œ ๋‹ค์–‘ํ•œ ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ ์šฉ ๋ถ„์•ผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ํ’ˆ์งˆ ๊ด€๋ฆฌ ๋ฐ ๋ณด์ฆ์„ ์œ„ํ•œ ์‚ฐ์—… ์ž๋™ํ™”
  • ๋ณด์•ˆ ๋ฐ ๊ฐ์‹œ๋ฅผ ์œ„ํ•œ ์‹ค์‹œ๊ฐ„ ๋น„๋””์˜ค ๋ถ„์„
  • ๋ฌผ์ฒด ๊ฐ์ง€ ๋ฐ ์„ธ๋ถ„ํ™”๋ฅผ ์œ„ํ•œ ์ž์œจ ์ฃผํ–‰ ์ฐจ๋Ÿ‰
  • ์ •ํ™•ํ•˜๊ณ  ๋น ๋ฅธ ์„ธ๋ถ„ํ™” ์ž‘์—…์„ ์œ„ํ•œ ์˜๋ฃŒ์šฉ ์ด๋ฏธ์ง•

๋‹ค์–‘ํ•œ ์‚ฌ์šฉ์ž ์ƒํ˜ธ์ž‘์šฉ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์œผ๋กœ FastSAM ๋‹ค์–‘ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค์— ์œ ์—ฐํ•˜๊ฒŒ ์ ์‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Python ์—์„œ ์ถ”๋ก ์— FastSAM ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜๋‚˜์š”?

Python ์—์„œ ์ถ”๋ก ์— FastSAM ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์•„๋ž˜ ์˜ˆ์‹œ๋ฅผ ๋”ฐ๋ฅด์„ธ์š”:

from ultralytics import FastSAM

# Define an inference source
source = "path/to/bus.jpg"

# Create a FastSAM model
model = FastSAM("FastSAM-s.pt")  # or FastSAM-x.pt

# Run inference on an image
everything_results = model(source, device="cpu", retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)

# Run inference with bboxes prompt
results = model(source, bboxes=[439, 437, 524, 709])

# Run inference with points prompt
results = model(source, points=[[200, 200]], labels=[1])

# Run inference with texts prompt
results = model(source, texts="a photo of a dog")

# Run inference with bboxes and points and texts prompt at the same time
results = model(source, bboxes=[439, 437, 524, 709], points=[[200, 200]], labels=[1], texts="a photo of a dog")

์ถ”๋ก  ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋ฌธ์„œ์˜ ์‚ฌ์šฉ ์˜ˆ์ธก ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์„ธ๋ถ„ํ™” ์ž‘์—…์— ๋Œ€ํ•ด FastSAM ์–ด๋–ค ์œ ํ˜•์˜ ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์ง€์›๋˜๋‚˜์š”?

FastSAM ๋Š” ์„ธ๋ถ„ํ™” ์ž‘์—…์„ ์•ˆ๋‚ดํ•˜๋Š” ๋‹ค์–‘ํ•œ ํ”„๋กฌํ”„ํŠธ ์œ ํ˜•์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค:

  • ๋ชจ๋“  ํ”„๋กฌํ”„ํŠธ: ๋ณด์ด๋Š” ๋ชจ๋“  ๊ฐœ์ฒด์— ๋Œ€ํ•œ ์„ธ๊ทธ๋จผํ…Œ์ด์…˜์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ”์šด๋”ฉ ์ƒ์ž(BBox) ํ”„๋กฌํ”„ํŠธ: ์ง€์ •๋œ ๊ฒฝ๊ณ„ ์ƒ์ž ๋‚ด์—์„œ ๊ฐœ์ฒด๋ฅผ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.
  • ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ: ์„ค๋ช… ํ…์ŠคํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ค๋ช…๊ณผ ์ผ์น˜ํ•˜๋Š” ๊ฐœ์ฒด๋ฅผ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.
  • ํฌ์ธํŠธ ํ”„๋กฌํ”„ํŠธ: ํŠน์ • ์‚ฌ์šฉ์ž ์ •์˜ ์ง€์  ๊ทผ์ฒ˜์˜ ๊ฐœ์ฒด๋ฅผ ์„ธ๊ทธ๋จผํŠธํ™”ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์œ ์—ฐ์„ฑ์„ ํ†ตํ•ด FastSAM ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ์ž ์ƒํ˜ธ์ž‘์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค์— ์ ์‘ํ•  ์ˆ˜ ์žˆ์–ด ์—ฌ๋Ÿฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์œ ์šฉ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ”„๋กฌํ”„ํŠธ ์‚ฌ์šฉ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ฃผ์š” ๊ธฐ๋Šฅ ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

๐Ÿ“…1 ๋…„ ์ „ ์ƒ์„ฑ๋จ โœ๏ธ 3 ๊ฐœ์›” ์ „ ์—…๋ฐ์ดํŠธ๋จ

๋Œ“๊ธ€