์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

๋ฐ”์ด๋‘( RT-DETR): ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ์‹ค์‹œ๊ฐ„ ๋ฌผ์ฒด ๊ฐ์ง€๊ธฐ

๊ฐœ์š”

๋ฐ”์ด๋‘๊ฐ€ ๊ฐœ๋ฐœํ•œ ์‹ค์‹œ๊ฐ„ ๊ฐ์ง€ ํŠธ๋žœ์Šคํฌ๋จธ(RT-DETR)๋Š” ๋†’์€ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์‹ค์‹œ๊ฐ„ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” ์ตœ์ฒจ๋‹จ ์—”๋“œํˆฌ์—”๋“œ ๊ฐ์ฒด ๊ฐ์ง€๊ธฐ์ž…๋‹ˆ๋‹ค. ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ(ViT)์˜ ์„ฑ๋Šฅ์„ ํ™œ์šฉํ•˜์—ฌ ๊ทœ๋ชจ ๋‚ด ์ƒํ˜ธ ์ž‘์šฉ๊ณผ ๊ทœ๋ชจ ๊ฐ„ ์œตํ•ฉ์„ ๋ถ„๋ฆฌํ•จ์œผ๋กœ์จ ๋ฉ€ํ‹ฐ์Šค์ผ€์ผ ํŠน์ง•์„ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. RT-DETR )๋Š” ์ ์‘๋ ฅ์ด ๋›ฐ์–ด๋‚˜ ์žฌ๊ต์œก ์—†์ด ๋‹ค์–‘ํ•œ ๋””์ฝ”๋” ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์†๋„๋ฅผ ์œ ์—ฐํ•˜๊ฒŒ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ CUDA์™€ ๊ฐ™์€ ๊ฐ€์†ํ™”๋œ ๋ฐฑ์—”๋“œ( TensorRT)์—์„œ ๋‹ค๋ฅธ ๋งŽ์€ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ๊ฐ์ง€๊ธฐ๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์˜ˆ์‹œ ์ด๋ฏธ์ง€ ๋ฐ”์ด๋‘์˜ ๊ฐœ์š” RT-DETR. RT-DETR ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ๋‹ค์ด์–ด๊ทธ๋žจ์€ ์ธ์ฝ”๋”์— ๋Œ€ํ•œ ์ž…๋ ฅ์œผ๋กœ ๋ฐฑ๋ณธ์˜ ๋งˆ์ง€๋ง‰ ์„ธ ๋‹จ๊ณ„ {S3, S4, S5}๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํšจ์œจ์ ์ธ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ธ์ฝ”๋”๋Š” ์Šค์ผ€์ผ ๋‚ด ํ”ผ์ฒ˜ ์ƒํ˜ธ ์ž‘์šฉ(AIFI)๊ณผ ์Šค์ผ€์ผ ๊ฐ„ ํ”ผ์ฒ˜ ์œตํ•ฉ ๋ชจ๋“ˆ(CCFM)์„ ํ†ตํ•ด ๋ฉ€ํ‹ฐ์Šค์ผ€์ผ ํ”ผ์ฒ˜๋ฅผ ์ด๋ฏธ์ง€ ํ”ผ์ฒ˜์˜ ์‹œํ€€์Šค๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ์€ ๋””์ฝ”๋”์˜ ์ดˆ๊ธฐ ์˜ค๋ธŒ์ ํŠธ ์ฟผ๋ฆฌ ์—ญํ• ์„ ํ•  ๊ณ ์ •๋œ ์ˆ˜์˜ ์ด๋ฏธ์ง€ ํŠน์ง•์„ ์„ ํƒํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ณด์กฐ ์˜ˆ์ธก ํ—ค๋“œ๊ฐ€ ์žˆ๋Š” ๋””์ฝ”๋”๋Š” ๊ฐ์ฒด ์ฟผ๋ฆฌ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ตœ์ ํ™”ํ•˜์—ฌ ๋ฐ•์Šค ๋ฐ ์‹ ๋ขฐ๋„ ์ ์ˆ˜๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค(์ถœ์ฒ˜).

์ฃผ์š” ๊ธฐ๋Šฅ

  • ํšจ์œจ์ ์ธ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ธ์ฝ”๋”: ๋ฐ”์ด๋‘( RT-DETR )๋Š” ํšจ์œจ์ ์ธ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ทœ๋ชจ ๋‚ด ์ƒํ˜ธ ์ž‘์šฉ๊ณผ ๊ทœ๋ชจ ๊ฐ„ ์œตํ•ฉ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๋ฉ€ํ‹ฐ์Šค์ผ€์ผ ๊ธฐ๋Šฅ์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด ๋…ํŠนํ•œ ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ์„ค๊ณ„๋Š” ๊ณ„์‚ฐ ๋น„์šฉ์„ ์ ˆ๊ฐํ•˜๊ณ  ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ๊ฐ์ง€๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  • IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ: Baidu์˜ RT-DETR ๋Š” IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐ์ฒด ์ฟผ๋ฆฌ ์ดˆ๊ธฐํ™”๋ฅผ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด ์žฅ๋ฉด์—์„œ ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ์ด ๋†’์€ ์˜ค๋ธŒ์ ํŠธ์— ์ง‘์ค‘ํ•˜์—ฌ ๊ฐ์ง€ ์ •ํ™•๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ ์‘ํ˜• ์ถ”๋ก  ์†๋„: ๋ฐ”์ด๋‘์˜ RT-DETR ๋Š” ์žฌ๊ต์œก ์—†์ด๋„ ๋‹ค์–‘ํ•œ ๋””์ฝ”๋” ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์†๋„๋ฅผ ์œ ์—ฐํ•˜๊ฒŒ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ ์‘์„ฑ ๋•๋ถ„์— ๋‹ค์–‘ํ•œ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ๊ฐ์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์‹ค์ œ ์ ์šฉ์ด ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ

Ultralytics Python API๋Š” ๋‹ค์–‘ํ•œ ์ฒ™๋„๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ PaddlePaddle RT-DETR ๋ชจ๋ธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • RT-DETR-L: COCO val2017์—์„œ 53.0% AP, T4 GPU์—์„œ 114 FPS
  • RT-DETR-X: COCO val2017์—์„œ 54.8% AP, T4 GPU์—์„œ 74 FPS

์‚ฌ์šฉ ์˜ˆ

์ด ์˜ˆ๋Š” ๊ฐ„๋‹จํ•œ RT-DETRR ํ›ˆ๋ จ ๋ฐ ์ถ”๋ก  ์˜ˆ์ œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋“œ์™€ ๋‹ค๋ฅธ ๋ชจ๋“œ์— ๋Œ€ํ•œ ์ „์ฒด ๋ฌธ์„œ๋Š” ์˜ˆ์ธก, ํ›ˆ๋ จ, Val ๋ฐ ๋‚ด๋ณด๋‚ด๊ธฐ ๋ฌธ์„œ ํŽ˜์ด์ง€๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ์ œ

from ultralytics import RTDETR

# Load a COCO-pretrained RT-DETR-l model
model = RTDETR('rtdetr-l.pt')

# Display model information (optional)
model.info()

# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)

# Run inference with the RT-DETR-l model on the 'bus.jpg' image
results = model('path/to/bus.jpg')
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640

# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg

์ง€์›๋˜๋Š” ์ž‘์—… ๋ฐ ๋ชจ๋“œ

์ด ํ‘œ์—๋Š” ๋ชจ๋ธ ์œ ํ˜•, ์‚ฌ์ „ ํ•™์Šต๋œ ํŠน์ • ๊ฐ€์ค‘์น˜, ๊ฐ ๋ชจ๋ธ์ด ์ง€์›ํ•˜๋Š” ์ž‘์—… ๋ฐ ์ง€์›๋˜๋Š” ๋‹ค์–‘ํ•œ ๋ชจ๋“œ(ํ•™์Šต, Val, ์˜ˆ์ธก, ๋‚ด๋ณด๋‚ด๊ธฐ)๊ฐ€ โœ… ์ด๋ชจํ‹ฐ์ฝ˜์œผ๋กœ ํ‘œ์‹œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ์œ ํ˜• ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ์›จ์ดํŠธ ์ง€์›๋˜๋Š” ์ž‘์—… ์ถ”๋ก  ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๊ต์œก ๋‚ด๋ณด๋‚ด๊ธฐ
RT-DETR Large rtdetr-l.pt ๋ฌผ์ฒด ๊ฐ์ง€ โœ… โœ… โœ… โœ…
RT-DETR ์ดˆ๋Œ€ํ˜• rtdetr-x.pt ๋ฌผ์ฒด ๊ฐ์ง€ โœ… โœ… โœ… โœ…

์ธ์šฉ ๋ฐ ๊ฐ์‚ฌ

์—ฐ๊ตฌ ๋˜๋Š” ๊ฐœ๋ฐœ ์ž‘์—…์— Baidu์˜ RT-DETR ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์›๋ณธ ๋…ผ๋ฌธ์„ ์ธ์šฉํ•ด ์ฃผ์„ธ์š”:

@misc{lv2023detrs,
      title={DETRs Beat YOLOs on Real-time Object Detection},
      author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
      year={2023},
      eprint={2304.08069},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

์ปดํ“จํ„ฐ ๋น„์ „ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๊ท€์ค‘ํ•œ ๋ฆฌ์†Œ์Šค๋ฅผ ์ œ์ž‘ํ•˜๊ณ  ์œ ์ง€ ๊ด€๋ฆฌํ•ด ์ฃผ์‹  Baidu์™€ PaddlePaddle ํŒ€์—๊ฒŒ ์ปดํ“จํ„ฐ ๋น„์ „ ์ปค๋ฎค๋‹ˆํ‹ฐ๋ฅผ ์œ„ํ•ด ์ด ๊ท€์ค‘ํ•œ ๋ฆฌ์†Œ์Šค๋ฅผ ๋งŒ๋“ค๊ณ  ์œ ์ง€ ๊ด€๋ฆฌํ•ด ์ฃผ์‹  ๊ฒƒ์— ๋Œ€ํ•ด ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ์‹ค์‹œ๊ฐ„ ๋ฌผ์ฒด ๊ฒ€์ถœ๊ธฐ( RT-DETR)๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ ์ด ๋ถ„์•ผ์— ๊ธฐ์—ฌํ•œ ๊ทธ๋“ค์˜ ๊ณตํ—Œ์— ๊นŠ์ด ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

ํ‚ค์›Œ๋“œ: RT-DETR, ํŠธ๋žœ์Šคํฌ๋จธ, ViT, ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ, ๋ฐ”์ด๋‘ RT-DETR, PaddlePaddle, ํŒจ๋“ค ํŒจ๋“ค RT-DETR, ์‹ค์‹œ๊ฐ„ ๋ฌผ์ฒด ๊ฐ์ง€, ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ฌผ์ฒด ๊ฐ์ง€, ์‚ฌ์ „ ํ•™์Šต๋œ PaddlePaddle RT-DETR ๋ชจ๋ธ, ๋ฐ”์ด๋‘์˜ RT-DETR ์‚ฌ์šฉ๋ฒ•, Ultralytics Python API



์ƒ์„ฑ๋จ 2023-11-12, ์—…๋ฐ์ดํŠธ๋จ 2024-01-16
์ž‘์„ฑ์ž: glenn-jocher (7)

๋Œ“๊ธ€