์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

SAM 2: ๋ฌด์—‡์ด๋“  ์„ธ๊ทธ๋จผํŠธ ๋ชจ๋ธ 2

SAM 2๋Š” ์ด๋ฏธ์ง€์™€ ๋™์˜์ƒ ๋ชจ๋‘์—์„œ ํฌ๊ด„์ ์ธ ๊ฐ์ฒด ๋ถ„ํ• ์„ ์œ„ํ•ด ์„ค๊ณ„๋œ ์ตœ์ฒจ๋‹จ ๋„๊ตฌ๋กœ, ๋ฉ”ํƒ€์˜ Segment Anything Model(SAM)์˜ ํ›„์† ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ์™€ ์ œ๋กœ ์ƒท ์ผ๋ฐ˜ํ™”๋ฅผ ์ง€์›ํ•˜๋Š” ํ†ตํ•ฉ์ ์ด๊ณ  ์‹ ์†ํ•œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ์‹œ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ํƒ์›”ํ•ฉ๋‹ˆ๋‹ค.

SAM 2 ๊ฒฐ๊ณผ ์˜ˆ์‹œ

์ฃผ์š” ๊ธฐ๋Šฅ



Watch: How to Run Inference with Meta's SAM2 using Ultralytics | Step-by-Step Guide ๐ŸŽ‰

ํ†ตํ•ฉ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜

SAM 2๋Š” ์ด๋ฏธ์ง€ ๋ฐ ๋น„๋””์˜ค ์„ธ๋ถ„ํ™” ๊ธฐ๋Šฅ์„ ๋‹จ์ผ ๋ชจ๋ธ์— ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ†ตํ•ฉ์œผ๋กœ ๋ฐฐํฌ๊ฐ€ ๊ฐ„์†Œํ™”๋˜๊ณ  ๋‹ค์–‘ํ•œ ๋ฏธ๋””์–ด ์œ ํ˜•์—์„œ ์ผ๊ด€๋œ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ ์—ฐํ•œ ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฐ˜ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ํฌ์ธํŠธ, ๊ฒฝ๊ณ„ ์ƒ์ž ๋˜๋Š” ๋งˆ์Šคํฌ์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ํ”„๋กฌํ”„ํŠธ ์œ ํ˜•์„ ํ†ตํ•ด ๊ด€์‹ฌ ์žˆ๋Š” ๊ฐœ์ฒด๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹ค์‹œ๊ฐ„ ์„ฑ๋Šฅ

์ด ๋ชจ๋ธ์€ ์ดˆ๋‹น ์•ฝ 44ํ”„๋ ˆ์ž„์„ ์ฒ˜๋ฆฌํ•˜๋Š” ์‹ค์‹œ๊ฐ„ ์ถ”๋ก  ์†๋„๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ SAM 2๋Š” ๋น„๋””์˜ค ํŽธ์ง‘ ๋ฐ ์ฆ๊ฐ• ํ˜„์‹ค๊ณผ ๊ฐ™์ด ์ฆ‰๊ฐ์ ์ธ ํ”ผ๋“œ๋ฐฑ์ด ํ•„์š”ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

์ œ๋กœ ์ƒท ์ผ๋ฐ˜ํ™”

SAM 2๋Š” ์ด์ „์— ๋ณธ ์ ์ด ์—†๋Š” ๋ฌผ์ฒด๋ฅผ ์„ธ๋ถ„ํ™”ํ•˜์—ฌ ๊ฐ•๋ ฅํ•œ ์ œ๋กœ ์ƒท ์ผ๋ฐ˜ํ™”๋ฅผ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์‚ฌ์ „ ์ •์˜๋œ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ๊ฐ์ฒด๋ฅผ ํฌ๊ด„ํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•˜๊ฑฐ๋‚˜ ์ง„ํ™”ํ•˜๋Š” ์‹œ๊ฐ์  ์˜์—ญ์—์„œ ํŠนํžˆ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋Œ€ํ™”ํ˜• ๊ฐœ์„ 

์‚ฌ์šฉ์ž๋Š” ์ถ”๊ฐ€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์„ธ๋ถ„ํ™” ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์„ธ๋ถ„ํ™”ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ถœ๋ ฅ์„ ์ •๋ฐ€ํ•˜๊ฒŒ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ˜ธ์ž‘์šฉ ๊ธฐ๋Šฅ์€ ๋น„๋””์˜ค ์ฃผ์„์ด๋‚˜ ์˜๋ฃŒ ์˜์ƒ๊ณผ ๊ฐ™์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ๊ฒฐ๊ณผ๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.

์‹œ๊ฐ์  ๊ณผ์ œ์— ๋Œ€ํ•œ ๊ณ ๊ธ‰ ์ฒ˜๋ฆฌ

SAM 2์—๋Š” ์˜ค๋ธŒ์ ํŠธ ์˜คํด๋ฃจ์ „ ๋ฐ ์žฌ์ถœํ˜„๊ณผ ๊ฐ™์€ ์ผ๋ฐ˜์ ์ธ ๋น„๋””์˜ค ๋ถ„ํ•  ๋ฌธ์ œ๋ฅผ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ •๊ตํ•œ ๋ฉ”๋ชจ๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•ด ์—ฌ๋Ÿฌ ํ”„๋ ˆ์ž„์—์„œ ์˜ค๋ธŒ์ ํŠธ๋ฅผ ์ถ”์ ํ•จ์œผ๋กœ์จ ์˜ค๋ธŒ์ ํŠธ๊ฐ€ ์ผ์‹œ์ ์œผ๋กœ ๊ฐ€๋ ค์ง€๊ฑฐ๋‚˜ ์žฅ๋ฉด์—์„œ ๋น ์ ธ๋‚˜๊ฐ”๋‹ค๊ฐ€ ๋‹ค์‹œ ๋“ค์–ด์˜ค๋Š” ๊ฒฝ์šฐ์—๋„ ์—ฐ์†์„ฑ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

SAM 2์˜ ์•„ํ‚คํ…์ฒ˜์™€ ๊ธฐ๋Šฅ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด SAM 2 ์—ฐ๊ตฌ ๋…ผ๋ฌธ์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์„ฑ๋Šฅ ๋ฐ ๊ธฐ์ˆ  ์„ธ๋ถ€ ์ •๋ณด

SAM 2๋Š” ๋‹ค์–‘ํ•œ ์ง€ํ‘œ์—์„œ ์ด์ „ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค:

Metric SAM 2 ์ด์ „ SOTA
์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ๋น„๋””์˜ค ์„ธ๋ถ„ํ™” ์ตœ๊ณ  -
์‚ฌ๋žŒ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ ํ•„์š” 3๋ฐฐ ๋” ์ ์€ ๊ธฐ์ค€์„ 
Image Segmentation Accuracy ๊ฐœ์„ ๋จ SAM
์ถ”๋ก  ์†๋„ 6๋ฐฐ ๋น ๋ฅธ ์†๋„ SAM

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜

ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ

  • Image and Video Encoder: Utilizes a transformer-based architecture to extract high-level features from both images and video frames. This component is responsible for understanding the visual content at each timestep.
  • ํ”„๋กฌํ”„ํŠธ ์ธ์ฝ”๋”: ์‚ฌ์šฉ์ž๊ฐ€ ์ œ๊ณตํ•œ ํ”„๋กฌํ”„ํŠธ(์ , ์ƒ์ž, ๋งˆ์Šคํฌ)๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ ์„ธ๋ถ„ํ™” ์ž‘์—…์„ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด SAM 2๋Š” ์‚ฌ์šฉ์ž ์ž…๋ ฅ์— ์ ์‘ํ•˜๊ณ  ์žฅ๋ฉด ๋‚ด์˜ ํŠน์ • ๊ฐœ์ฒด๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ฉ”๋ชจ๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜: ๋ฉ”๋ชจ๋ฆฌ ์ธ์ฝ”๋”, ๋ฉ”๋ชจ๋ฆฌ ๋ฑ…ํฌ, ๋ฉ”๋ชจ๋ฆฌ ์ฃผ์˜ ๋ชจ๋“ˆ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์„ฑ ์š”์†Œ๋Š” ๊ณผ๊ฑฐ ํ”„๋ ˆ์ž„์˜ ์ •๋ณด๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ์ €์žฅํ•˜๊ณ  ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ์‹œ๊ฐ„์ด ์ง€๋‚˜๋„ ์ผ๊ด€๋œ ๊ฐ์ฒด ์ถ”์ ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • ๋งˆ์Šคํฌ ๋””์ฝ”๋”: ์ธ์ฝ”๋”ฉ๋œ ์ด๋ฏธ์ง€ ํŠน์ง•๊ณผ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ตœ์ข… ์„ธ๊ทธ๋จผํ…Œ์ด์…˜ ๋งˆ์Šคํฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋น„๋””์˜ค์—์„œ๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ปจํ…์ŠคํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ”„๋ ˆ์ž„ ์ „์ฒด์— ๊ฑธ์ณ ์ •ํ™•ํ•œ ์ถ”์ ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

SAM 2 ์•„ํ‚คํ…์ฒ˜ ๋‹ค์ด์–ด๊ทธ๋žจ

๋ฉ”๋ชจ๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ฐ ์˜คํด๋ฃจ์ „ ์ฒ˜๋ฆฌ

๋ฉ”๋ชจ๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด SAM 2๋Š” ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ์˜ ์‹œ๊ฐ„์  ์ข…์†์„ฑ๊ณผ ์˜คํด๋ฃจ์ „์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ์ฒด๊ฐ€ ์›€์ง์ด๊ณ  ์ƒํ˜ธ ์ž‘์šฉํ•  ๋•Œ SAM 2๋Š” ๊ฐ์ฒด์˜ ํŠน์ง•์„ ๋ฉ”๋ชจ๋ฆฌ ๋ฑ…ํฌ์— ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. ์˜ค๋ธŒ์ ํŠธ๊ฐ€ ๊ฐ€๋ ค์ง€๋ฉด ๋ชจ๋ธ์€ ์ด ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์‹œ ๋‚˜ํƒ€๋‚  ๋•Œ์˜ ์œ„์น˜์™€ ๋ชจ์–‘์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜คํด๋ฃจ์ „ ํ—ค๋“œ๋Š” ํŠนํžˆ ์˜ค๋ธŒ์ ํŠธ๊ฐ€ ๋ณด์ด์ง€ ์•Š๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ ์˜ค๋ธŒ์ ํŠธ๊ฐ€ ๊ฐ€๋ ค์งˆ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

๋ฉ€ํ‹ฐ ๋งˆ์Šคํฌ ๋ชจํ˜ธ์„ฑ ํ•ด๊ฒฐ

๋ฌผ์ฒด๊ฐ€ ๊ฒน์น˜๋Š” ๋“ฑ ๋ชจํ˜ธํ•œ ์ƒํ™ฉ์—์„œ๋Š” SAM 2์—์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋งˆ์Šคํฌ ์˜ˆ์ธก์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์€ ๋‹จ์ผ ๋งˆ์Šคํฌ๋กœ ์žฅ๋ฉด์˜ ๋‰˜์•™์Šค๋ฅผ ์ถฉ๋ถ„ํžˆ ์„ค๋ช…ํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ๋Š” ๋ณต์žกํ•œ ์žฅ๋ฉด์„ ์ •ํ™•ํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๋Š” ๋ฐ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

SA-V ๋ฐ์ดํ„ฐ ์„ธํŠธ

SAM 2์˜ ๊ต์œก์„ ์œ„ํ•ด ๊ฐœ๋ฐœ๋œ SA-V ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ๊ฐ€์žฅ ํฌ๊ณ  ๋‹ค์–‘ํ•œ ๋น„๋””์˜ค ์„ธ๋ถ„ํ™” ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ๋‹ค์Œ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค:

  • 51,000๊ฐœ ์ด์ƒ์˜ ๋™์˜์ƒ: 47๊ฐœ๊ตญ์—์„œ ์ดฌ์˜ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์‹ค์ œ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • 600,000๊ฐœ ์ด์ƒ์˜ ๋งˆ์Šคํฌ ์ฃผ์„: '๋งˆ์Šคํฌ ๋ฆฟ'์ด๋ผ๊ณ  ํ•˜๋Š” ์ƒ์„ธํ•œ ์‹œ๊ณต๊ฐ„ ๋งˆ์Šคํฌ ์ฃผ์„์œผ๋กœ ์ „์ฒด ์˜ค๋ธŒ์ ํŠธ์™€ ๋ถ€ํ’ˆ์„ ํฌ๊ด„ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ทœ๋ชจ: ์ด์ „ ์ตœ๋Œ€ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ณด๋‹ค 4.5๋ฐฐ ๋” ๋งŽ์€ ๋™์˜์ƒ๊ณผ 53๋ฐฐ ๋” ๋งŽ์€ ์ฃผ์„์ด ํฌํ•จ๋˜์–ด ์žˆ์–ด ์ „๋ก€ ์—†๋Š” ๋‹ค์–‘์„ฑ๊ณผ ๋ณต์žก์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋ฒค์น˜๋งˆํฌ

๋น„๋””์˜ค ์˜ค๋ธŒ์ ํŠธ ์„ธ๋ถ„ํ™”

SAM 2๋Š” ์ฃผ์š” ๋น„๋””์˜ค ์„ธ๋ถ„ํ™” ๋ฒค์น˜๋งˆํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค:

๋ฐ์ดํ„ฐ ์„ธํŠธ J&F J F
DAVIS 2017 82.5 79.8 85.2
YouTube-VOS 81.2 78.9 83.5

๋Œ€ํ™”ํ˜• ์„ธ๋ถ„ํ™”

๋Œ€ํ™”ํ˜• ์„ธ๋ถ„ํ™” ์ž‘์—…์—์„œ SAM 2๋Š” ์ƒ๋‹นํ•œ ํšจ์œจ์„ฑ๊ณผ ์ •ํ™•์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค:

๋ฐ์ดํ„ฐ ์„ธํŠธ NoC@90 AUC
๋ฐ์ด๋น„์Šค ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ 1.54 0.872

์„ค์น˜

SAM 2๋ฅผ ์„ค์น˜ํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋ช…๋ น์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  SAM 2 ๋ชจ๋ธ์€ ์ฒ˜์Œ ์‚ฌ์šฉํ•  ๋•Œ ์ž๋™์œผ๋กœ ๋‹ค์šด๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค.

pip install ultralytics

์‚ฌ์šฉ ๋ฐฉ๋ฒ• SAM 2: ์ด๋ฏธ์ง€ ๋ฐ ๋™์˜์ƒ ์„ธ๋ถ„ํ™”์˜ ๋‹ค์–‘์„ฑ

๋‹ค์Œ ํ‘œ์—์„œ๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ SAM 2 ๋ชจ๋ธ, ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜, ์ง€์›๋˜๋Š” ์ž‘์—…, ์ถ”๋ก , ๊ฒ€์ฆ, ํ•™์Šต, ๋‚ด๋ณด๋‚ด๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ์šด์˜ ๋ชจ๋“œ์™€์˜ ํ˜ธํ™˜์„ฑ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์œ ํ˜• ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜ ์ง€์›๋˜๋Š” ์ž‘์—… ์ถ”๋ก  ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๊ต์œก ๋‚ด๋ณด๋‚ด๊ธฐ
SAM 2 ์ž‘์€ sam2_t.pt ์ธ์Šคํ„ด์Šค ์„ธ๋ถ„ํ™” โœ… โŒ โŒ โŒ
SAM 2 ์ž‘์€ sam2_s.pt ์ธ์Šคํ„ด์Šค ์„ธ๋ถ„ํ™” โœ… โŒ โŒ โŒ
SAM 2 ๋ฒ ์ด์Šค sam2_b.pt ์ธ์Šคํ„ด์Šค ์„ธ๋ถ„ํ™” โœ… โŒ โŒ โŒ
SAM ๋Œ€ํ˜• 2๊ฐœ sam2_l.pt ์ธ์Šคํ„ด์Šค ์„ธ๋ถ„ํ™” โœ… โŒ โŒ โŒ

SAM 2 ์˜ˆ์ธก ์˜ˆ์‹œ

SAM 2๋Š” ์‹ค์‹œ๊ฐ„ ๋น„๋””์˜ค ํŽธ์ง‘, ์˜๋ฃŒ ์˜์ƒ, ์ž์œจ ์‹œ์Šคํ…œ ๋“ฑ ๋‹ค์–‘ํ•œ ์ž‘์—…์—์„œ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ •์  ๋ฐ ๋™์  ์‹œ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์„ธ๋ถ„ํ™”ํ•  ์ˆ˜ ์žˆ์–ด ์—ฐ๊ตฌ์ž์™€ ๊ฐœ๋ฐœ์ž๋ฅผ ์œ„ํ•œ ๋‹ค์šฉ๋„ ๋„๊ตฌ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์žˆ๋Š” ์„ธ๊ทธ๋จผํŠธ

ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์žˆ๋Š” ์„ธ๊ทธ๋จผํŠธ

ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋˜๋Š” ๋™์˜์ƒ์—์„œ ํŠน์ • ๊ฐœ์ฒด๋ฅผ ๋ถ„ํ• ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from ultralytics import SAM

# Load a model
model = SAM("sam2_b.pt")

# Display model information (optional)
model.info()

# Segment with bounding box prompt
results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])

# Segment with point prompt
results = model("path/to/image.jpg", points=[150, 150], labels=[1])

๋ชจ๋“  ๊ฒƒ์„ ์„ธ๋ถ„ํ™”

๋ชจ๋“  ๊ฒƒ์„ ์„ธ๋ถ„ํ™”

ํŠน์ • ํ”„๋กฌํ”„ํŠธ ์—†์ด ์ „์ฒด ์ด๋ฏธ์ง€ ๋˜๋Š” ๋™์˜์ƒ ์ฝ˜ํ…์ธ ๋ฅผ ์„ธ๊ทธ๋จผํŠธํ™”ํ•ฉ๋‹ˆ๋‹ค.

from ultralytics import SAM

# Load a model
model = SAM("sam2_b.pt")

# Display model information (optional)
model.info()

# Run inference
model("path/to/video.mp4")
# Run inference with a SAM 2 model
yolo predict model=sam2_b.pt source=path/to/video.mp4
  • ์ด ์˜ˆ๋Š” ํ”„๋กฌํ”„ํŠธ(b๋ฐ•์Šค/ํฌ์ธํŠธ/๋งˆ์Šคํฌ)๊ฐ€ ์ œ๊ณต๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ SAM 2๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋˜๋Š” ๋™์˜์ƒ์˜ ์ „์ฒด ์ฝ˜ํ…์ธ ๋ฅผ ์„ธ๊ทธ๋จผํŠธํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

SAM 2 comparison vs YOLOv8

Here we compare Meta's smallest SAM 2 model, SAM2-t, with Ultralytics smallest segmentation model, YOLOv8n-seg:

๋ชจ๋ธ Size
(MB)
Parameters
(M)
Speed (CPU)
(ms/im)
Meta SAM-b 375 93.7 161440
Meta SAM2-b 162 80.8 121923
Meta SAM2-t 78.1 38.9 85155
MobileSAM 40.7 10.1 98543
FastSAM-s์™€ YOLOv8 ๋ฐฑ๋ณธ 23.7 11.8 140
Ultralytics YOLOv8n-seg 6.7 (11.7x smaller) 3.4 (11.4x less) 79.5 (1071x faster)

์ด ๋น„๊ต๋Š” ๋ชจ๋ธ ๊ฐ„์˜ ๋ชจ๋ธ ํฌ๊ธฐ์™€ ์†๋„์—์„œ ์—„์ฒญ๋‚œ ์ฐจ์ด๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. SAM ์€ ์ž๋™ ์„ธ๊ทธ๋จผํŠธ๋ฅผ ์œ„ํ•œ ๊ณ ์œ ํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜์ง€๋งŒ, ๋” ์ž‘๊ณ  ๋น ๋ฅด๋ฉฐ ํšจ์œจ์ ์ธ YOLOv8 ์„ธ๊ทธ๋จผํŠธ ๋ชจ๋ธ๊ณผ ์ง์ ‘์ ์œผ๋กœ ๊ฒฝ์Ÿํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค.

Tests run on a 2023 Apple M2 Macbook with 16GB of RAM using torch==2.3.1 ๊ทธ๋ฆฌ๊ณ  ultralytics==8.3.82. To reproduce this test:

์˜ˆ

from ultralytics import ASSETS, SAM, YOLO, FastSAM

# Profile SAM2-t, SAM2-b, SAM-b, MobileSAM
for file in ["sam_b.pt", "sam2_b.pt", "sam2_t.pt", "mobile_sam.pt"]:
    model = SAM(file)
    model.info()
    model(ASSETS)

# Profile FastSAM-s
model = FastSAM("FastSAM-s.pt")
model.info()
model(ASSETS)

# Profile YOLOv8n-seg
model = YOLO("yolov8n-seg.pt")
model.info()
model(ASSETS)

์ž๋™ ์ฃผ์„: ํšจ์œจ์ ์ธ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ ์ƒ์„ฑ

์ž๋™ ์ฃผ์„์€ SAM 2์˜ ๊ฐ•๋ ฅํ•œ ๊ธฐ๋Šฅ์œผ๋กœ, ์‚ฌ์šฉ์ž๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•˜๊ฒŒ ์„ธ๋ถ„ํ™” ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์€ ๋งŽ์€ ์ˆ˜์ž‘์—… ์—†์ด ๋Œ€๊ทœ๋ชจ์˜ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ํŠนํžˆ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

SAM 2๋กœ ์ž๋™ ์ฃผ์„์„ ๋‹ค๋Š” ๋ฐฉ๋ฒ•

SAM 2๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์— ์ž๋™ ์ฃผ์„์„ ๋‹ฌ๋ ค๋ฉด ์ด ์˜ˆ์ œ๋ฅผ ๋”ฐ๋ฅด์„ธ์š”:

์ž๋™ ์ฃผ์„ ์˜ˆ์‹œ

from ultralytics.data.annotator import auto_annotate

auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model="sam2_b.pt")
์ธ์ˆ˜ ์œ ํ˜• ์„ค๋ช… ๊ธฐ๋ณธ๊ฐ’
data str ์ฃผ์„์„ ๋‹ฌ ์ด๋ฏธ์ง€๊ฐ€ ํฌํ•จ๋œ ํด๋”์˜ ๊ฒฝ๋กœ์ž…๋‹ˆ๋‹ค.
det_model str์„ ํƒ ์‚ฌํ•ญ ์‚ฌ์ „ ํ•™์Šต๋œ YOLO ํƒ์ง€ ๋ชจ๋ธ. ๊ธฐ๋ณธ๊ฐ’์€ 'yolov8x.pt'์ž…๋‹ˆ๋‹ค. 'yolov8x.pt'
sam_model str์„ ํƒ ์‚ฌํ•ญ ์‚ฌ์ „ ํ•™์Šต๋œ SAM 2 ์„ธ๋ถ„ํ™” ๋ชจ๋ธ. ๊ธฐ๋ณธ๊ฐ’์€ 'sam2_b.pt'์ž…๋‹ˆ๋‹ค. 'sam2_b.pt'
device str์„ ํƒ ์‚ฌํ•ญ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•  ์žฅ์น˜์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ ๋นˆ ๋ฌธ์ž์—ด(CPU ๋˜๋Š” GPU, ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ)์ž…๋‹ˆ๋‹ค.
output_dir str, None์„ ํƒ ์‚ฌํ•ญ ๋””๋ ‰ํ„ฐ๋ฆฌ์— ์ฃผ์„์ด ๋‹ฌ๋ฆฐ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ '๋ฐ์ดํ„ฐ'์™€ ๊ฐ™์€ ๋””๋ ‰ํ„ฐ๋ฆฌ์— ์žˆ๋Š” '๋ ˆ์ด๋ธ”' ํด๋”์ž…๋‹ˆ๋‹ค. None

์ด ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๋ฉด ๊ณ ํ’ˆ์งˆ์˜ ์„ธ๋ถ„ํ™” ๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ์‹ ์†ํ•˜๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์–ด ํ”„๋กœ์ ํŠธ๋ฅผ ๊ฐ€์†ํ™”ํ•˜๋ ค๋Š” ์—ฐ๊ตฌ์ž์™€ ๊ฐœ๋ฐœ์ž์—๊ฒŒ ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค.

์ œํ•œ ์‚ฌํ•ญ

์ด๋Ÿฌํ•œ ์žฅ์ ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  SAM 2์—๋Š” ๋ช‡ ๊ฐ€์ง€ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์ถ”์  ์•ˆ์ •์„ฑ: SAM 2๋Š” ์‹œํ€€์Šค๊ฐ€ ํ™•์žฅ๋˜๊ฑฐ๋‚˜ ์‹œ์ ์ด ํฌ๊ฒŒ ๋ณ€๊ฒฝ๋˜๋Š” ๋™์•ˆ ์˜ค๋ธŒ์ ํŠธ๋ฅผ ์ถ”์ ํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์˜ค๋ธŒ์ ํŠธ ํ˜ผ๋™: ํŠนํžˆ ํ˜ผ์žกํ•œ ์žฅ๋ฉด์—์„œ ๋ชจ๋ธ์ด ๋น„์Šทํ•œ ๋ชจ์–‘์˜ ๋ฌผ์ฒด๋ฅผ ํ˜ผ๋™ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์—ฌ๋Ÿฌ ๊ฐœ์ฒด๋ฅผ ์‚ฌ์šฉํ•œ ํšจ์œจ์„ฑ: ์—ฌ๋Ÿฌ ๊ฐœ์ฒด๋ฅผ ๋™์‹œ์— ์ฒ˜๋ฆฌํ•  ๊ฒฝ์šฐ ๊ฐœ์ฒด ๊ฐ„ ํ†ต์‹ ์ด ๋ถ€์กฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„ธ๋ถ„ํ™” ํšจ์œจ์„ฑ์ด ๋–จ์–ด์ง‘๋‹ˆ๋‹ค.
  • Detail Accuracy: May miss fine details, especially with fast-moving objects. Additional prompts can partially address this issue, but temporal smoothness is not guaranteed.

์ธ์šฉ ๋ฐ ๊ฐ์‚ฌ

SAM 2๊ฐ€ ์—ฐ๊ตฌ ๋˜๋Š” ๊ฐœ๋ฐœ ์ž‘์—…์˜ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์ธ ๊ฒฝ์šฐ ๋‹ค์Œ ์ฐธ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ์šฉํ•ด ์ฃผ์„ธ์š”:

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint},
  year={2024}
}

์ด ํš๊ธฐ์ ์ธ ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ AI ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๊ธฐ์—ฌํ•œ Meta AI์— ๊ฐ์‚ฌ์˜ ๋งˆ์Œ์„ ์ „ํ•ฉ๋‹ˆ๋‹ค.

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

SAM 2๋Š” ๋ฌด์—‡์ด๋ฉฐ ๊ธฐ์กด Segment Anything ๋ชจ๋ธ(SAM)์„ ์–ด๋–ป๊ฒŒ ๊ฐœ์„ ํ–ˆ๋‚˜์š”?

SAM 2๋Š” ์ด๋ฏธ์ง€์™€ ๋™์˜์ƒ ๋ชจ๋‘์—์„œ ํฌ๊ด„์ ์ธ ๊ฐ์ฒด ๋ถ„ํ• ์„ ์œ„ํ•ด ์„ค๊ณ„๋œ ์ตœ์ฒจ๋‹จ ๋„๊ตฌ๋กœ, ๋ฉ”ํƒ€์˜ Segment Anything Model(SAM)์˜ ํ›„์† ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ์™€ ์ œ๋กœ ์ƒท ์ผ๋ฐ˜ํ™”๋ฅผ ์ง€์›ํ•˜๋Š” ํ†ตํ•ฉ์ ์ด๊ณ  ์‹ ์†ํ•œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ์‹œ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ํƒ์›”ํ•ฉ๋‹ˆ๋‹ค. SAM 2๋Š” ๊ธฐ์กด ๋ฒ„์ „( SAM)์— ๋น„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ช‡ ๊ฐ€์ง€ ๊ฐœ์„  ์‚ฌํ•ญ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • ํ†ตํ•ฉ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜: ์ด๋ฏธ์ง€ ๋ฐ ๋น„๋””์˜ค ์„ธ๋ถ„ํ™” ๊ธฐ๋Šฅ์„ ๋‹จ์ผ ๋ชจ๋ธ์— ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • ์‹ค์‹œ๊ฐ„ ์„ฑ๋Šฅ: ์ดˆ๋‹น ์•ฝ 44ํ”„๋ ˆ์ž„์„ ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ ์ฆ‰๊ฐ์ ์ธ ํ”ผ๋“œ๋ฐฑ์ด ํ•„์š”ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ œ๋กœ ์ƒท ์ผ๋ฐ˜ํ™”: ๋‹ค์–‘ํ•œ ์‹œ๊ฐ์  ์˜์—ญ์—์„œ ์œ ์šฉํ•œ, ์ด์ „์— ์ ‘ํ•œ ์ ์ด ์—†๋Š” ๊ฐ์ฒด๋ฅผ ์„ธ๊ทธ๋จผํŠธํ™”ํ•ฉ๋‹ˆ๋‹ค.
  • ๋Œ€ํ™”ํ˜• ์„ธ๋ถ„ํ™”: ์ถ”๊ฐ€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ์„ธ๋ถ„ํ™” ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ตฌ์ฒดํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ณ ๊ธ‰ ์‹œ๊ฐ์  ๋ฌธ์ œ ์ฒ˜๋ฆฌ: ์˜ค๋ธŒ์ ํŠธ ์˜คํด๋ฃจ์ „ ๋ฐ ๋‹ค์‹œ ๋‚˜ํƒ€๋‚˜๊ธฐ ๊ฐ™์€ ์ผ๋ฐ˜์ ์ธ ๋น„๋””์˜ค ๋ถ„ํ•  ๋ฌธ์ œ๋ฅผ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

SAM 2์˜ ์•„ํ‚คํ…์ฒ˜ ๋ฐ ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ SAM 2 ์—ฐ๊ตฌ ๋…ผ๋ฌธ์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์‹ค์‹œ๊ฐ„ ๋™์˜์ƒ ๋ถ„ํ• ์— SAM 2๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜๋‚˜์š”?

SAM 2์˜ ํ”„๋กฌํ”„ํŠธ ๊ฐ€๋Šฅํ•œ ์ธํ„ฐํŽ˜์ด์Šค์™€ ์‹ค์‹œ๊ฐ„ ์ถ”๋ก  ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ๋™์˜์ƒ ๋ถ„ํ• ์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ๊ธฐ๋ณธ์ ์ธ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค:

ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์žˆ๋Š” ์„ธ๊ทธ๋จผํŠธ

ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋˜๋Š” ๋™์˜์ƒ์—์„œ ํŠน์ • ๊ฐœ์ฒด๋ฅผ ๋ถ„ํ• ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from ultralytics import SAM

# Load a model
model = SAM("sam2_b.pt")

# Display model information (optional)
model.info()

# Segment with bounding box prompt
results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])

# Segment with point prompt
results = model("path/to/image.jpg", points=[150, 150], labels=[1])

๋ณด๋‹ค ์ž์„ธํ•œ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์€ SAM 2 ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

SAM 2๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐ ์–ด๋–ค ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ์‚ฌ์šฉ๋˜๋ฉฐ, ์–ด๋–ป๊ฒŒ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋‚˜์š”?

SAM 2๋Š” ๊ฐ€์žฅ ํฌ๊ณ  ๋‹ค์–‘ํ•œ ๋น„๋””์˜ค ์„ธ๋ถ„ํ™” ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ค‘ ํ•˜๋‚˜์ธ SA-V ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. SA-V ๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋Š” ๋‹ค์Œ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค:

  • 51,000๊ฐœ ์ด์ƒ์˜ ๋™์˜์ƒ: 47๊ฐœ๊ตญ์—์„œ ์ดฌ์˜ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์‹ค์ œ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • 600,000๊ฐœ ์ด์ƒ์˜ ๋งˆ์Šคํฌ ์ฃผ์„: '๋งˆ์Šคํฌ ๋ฆฟ'์ด๋ผ๊ณ  ํ•˜๋Š” ์ƒ์„ธํ•œ ์‹œ๊ณต๊ฐ„ ๋งˆ์Šคํฌ ์ฃผ์„์œผ๋กœ ์ „์ฒด ์˜ค๋ธŒ์ ํŠธ์™€ ๋ถ€ํ’ˆ์„ ํฌ๊ด„ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ทœ๋ชจ: ์ด์ „ ์ตœ๋Œ€ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ณด๋‹ค 4.5๋ฐฐ ๋” ๋งŽ์€ ๋™์˜์ƒ๊ณผ 53๋ฐฐ ๋” ๋งŽ์€ ์ฃผ์„์ด ํฌํ•จ๋˜์–ด ์žˆ์–ด ์ „๋ก€ ์—†๋Š” ๋‹ค์–‘์„ฑ๊ณผ ๋ณต์žก์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ด‘๋ฒ”์œ„ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ํ†ตํ•ด SAM 2๋Š” ์ฃผ์š” ๋น„๋””์˜ค ์„ธ๋ถ„ํ™” ๋ฒค์น˜๋งˆํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ณ  ์ œ๋กœ ์ƒท ์ผ๋ฐ˜ํ™” ๊ธฐ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ SA-V ๋ฐ์ดํ„ฐ ์„ธํŠธ ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

SAM 2๋Š” ๋™์˜์ƒ ๋ถ„ํ• ์—์„œ ์˜คํด๋ฃจ์ „ ๋ฐ ์˜ค๋ธŒ์ ํŠธ ์žฌ์ถœํ˜„์„ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋‚˜์š”?

SAM 2์—๋Š” ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ์˜ ์‹œ๊ฐ„์  ์ข…์†์„ฑ๊ณผ ์˜คํด๋ฃจ์ „์„ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ์ •๊ตํ•œ ๋ฉ”๋ชจ๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  • ๋ฉ”๋ชจ๋ฆฌ ์ธ์ฝ”๋” ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๋ฑ…ํฌ: ์ง€๋‚œ ํ”„๋ ˆ์ž„์˜ ๊ธฐ๋Šฅ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฉ”๋ชจ๋ฆฌ ์ฃผ์˜ ๋ชจ๋“ˆ: ์ €์žฅ๋œ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹œ๊ฐ„์ด ์ง€๋‚˜๋„ ์ผ๊ด€๋œ ๊ฐœ์ฒด ์ถ”์ ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  • ์˜คํด๋ฃจ์ „ ํ—ค๋“œ: ์˜ค๋ธŒ์ ํŠธ๊ฐ€ ๋ณด์ด์ง€ ์•Š๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๊ตฌ์ฒด์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜์—ฌ ์˜ค๋ธŒ์ ํŠธ๊ฐ€ ๊ฐ€๋ ค์งˆ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์˜ค๋ธŒ์ ํŠธ๊ฐ€ ์ผ์‹œ์ ์œผ๋กœ ๊ฐ€๋ ค์ง€๊ฑฐ๋‚˜ ์”ฌ์—์„œ ๋‚˜๊ฐ”๋‹ค๊ฐ€ ๋‹ค์‹œ ๋“ค์–ด์˜ค๋Š” ๊ฒฝ์šฐ์—๋„ ์—ฐ์†์„ฑ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋ฉ”๋ชจ๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ฐ ์˜คํด๋ฃจ์ „ ์ฒ˜๋ฆฌ ์„น์…˜์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

SAM 2๋Š” YOLOv8 ๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ ์„ธ๋ถ„ํ™” ๋ชจ๋ธ๊ณผ ์–ด๋–ป๊ฒŒ ๋น„๊ต๋˜๋‚˜์š”?

SAM 2 and Ultralytics YOLOv8 serve different purposes and excel in different areas. While SAM 2 is designed for comprehensive object segmentation with advanced features like zero-shot generalization and real-time performance, YOLOv8 is optimized for speed and efficiency in object detection and segmentation tasks. Here's a comparison:

๋ชจ๋ธ Size
(MB)
Parameters
(M)
Speed (CPU)
(ms/im)
Meta SAM-b 375 93.7 161440
Meta SAM2-b 162 80.8 121923
Meta SAM2-t 78.1 38.9 85155
MobileSAM 40.7 10.1 98543
FastSAM-s์™€ YOLOv8 ๋ฐฑ๋ณธ 23.7 11.8 140
Ultralytics YOLOv8n-seg 6.7 (11.7x smaller) 3.4 (11.4x less) 79.5 (1071x faster)

For more details, see the SAM 2 comparison vs YOLOv8 section.


๐Ÿ“… Created 2 months ago โœ๏ธ Updated 7 days ago

๋Œ“๊ธ€