์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

Baidu's RT-DETR: A Vision Transformer-Based Real-Time Object Detector

๊ฐœ์š”

Real-Time Detection Transformer (RT-DETR), developed by Baidu, is a cutting-edge end-to-end object detector that provides real-time performance while maintaining high accuracy. It is based on the idea of DETR (the NMS-free framework), meanwhile introducing conv-based backbone and an efficient hybrid encoder to gain real-time speed. RT-DETR efficiently processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. The model is highly adaptable, supporting flexible adjustment of inference speed using different decoder layers without retraining. RT-DETR excels on accelerated backends like CUDA with TensorRT, outperforming many other real-time object detectors.



Watch: ์‹ค์‹œ๊ฐ„ ๊ฐ์ง€ ํŠธ๋žœ์Šคํฌ๋จธ (RT-DETR)

๋ชจ๋ธ ์˜ˆ์‹œ ์ด๋ฏธ์ง€ ๋ฐ”์ด๋‘์˜ ๊ฐœ์š” RT-DETR. RT-DETR ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ๋‹ค์ด์–ด๊ทธ๋žจ์€ ์ธ์ฝ”๋”์— ๋Œ€ํ•œ ์ž…๋ ฅ์œผ๋กœ ๋ฐฑ๋ณธ์˜ ๋งˆ์ง€๋ง‰ ์„ธ ๋‹จ๊ณ„ {S3, S4, S5}๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํšจ์œจ์ ์ธ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ธ์ฝ”๋”๋Š” ์Šค์ผ€์ผ ๋‚ด ํŠน์ง• ์ƒํ˜ธ ์ž‘์šฉ(AIFI)๊ณผ ์Šค์ผ€์ผ ๊ฐ„ ํŠน์ง• ์œตํ•ฉ ๋ชจ๋“ˆ(CCFM)์„ ํ†ตํ•ด ๋ฉ€ํ‹ฐ์Šค์ผ€์ผ ํŠน์ง•์„ ์ด๋ฏธ์ง€ ํŠน์ง• ์‹œํ€€์Šค๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ์€ ๋””์ฝ”๋”์˜ ์ดˆ๊ธฐ ์˜ค๋ธŒ์ ํŠธ ์ฟผ๋ฆฌ๋กœ ์‚ฌ์šฉํ•  ๊ณ ์ •๋œ ์ˆ˜์˜ ์ด๋ฏธ์ง€ ํŠน์ง•์„ ์„ ํƒํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ณด์กฐ ์˜ˆ์ธก ํ—ค๋“œ๊ฐ€ ์žˆ๋Š” ๋””์ฝ”๋”๋Š” ๊ฐ์ฒด ์ฟผ๋ฆฌ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ตœ์ ํ™”ํ•˜์—ฌ ๋ฐ•์Šค ๋ฐ ์‹ ๋ขฐ ์ ์ˆ˜(์ถœ์ฒ˜).

์ฃผ์š” ๊ธฐ๋Šฅ

  • Efficient Hybrid Encoder: Baidu's RT-DETR uses an efficient hybrid encoder that processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time object detection.
  • IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ: Baidu์˜ RT-DETR ๋Š” IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐ์ฒด ์ฟผ๋ฆฌ ์ดˆ๊ธฐํ™”๋ฅผ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด ์žฅ๋ฉด์—์„œ ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ์ด ๋†’์€ ๊ฐ์ฒด์— ์ง‘์ค‘ํ•˜์—ฌ ๊ฐ์ง€ ์ •ํ™•๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ ์‘ํ˜• ์ถ”๋ก  ์†๋„: ๋ฐ”์ด๋‘์˜ RT-DETR ๋Š” ์žฌ๊ต์œก ์—†์ด๋„ ๋‹ค์–‘ํ•œ ๋””์ฝ”๋” ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์†๋„๋ฅผ ์œ ์—ฐํ•˜๊ฒŒ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ ์‘์„ฑ ๋•๋ถ„์— ๋‹ค์–‘ํ•œ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ๊ฐ์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์‹ค์ œ ์ ์šฉ์ด ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ

Ultralytics Python API๋Š” ๋‹ค์–‘ํ•œ ์Šค์ผ€์ผ๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ PaddlePaddle RT-DETR ๋ชจ๋ธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • RT-DETR-L: COCO val2017์—์„œ 53.0% AP, T4์—์„œ 114 FPS GPU
  • RT-DETR-X: COCO val2017์—์„œ 54.8% AP, T4์—์„œ 74 FPS GPU

์‚ฌ์šฉ ์˜ˆ

์ด ์˜ˆ๋Š” ๊ฐ„๋‹จํ•œ RT-DETR ํ•™์Šต ๋ฐ ์ถ”๋ก  ์˜ˆ์ œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋“œ ๋ฐ ๊ธฐํƒ€ ๋ชจ๋“œ์— ๋Œ€ํ•œ ์ „์ฒด ์„ค๋ช…์„œ๋Š” ์˜ˆ์ธก, ํ•™์Šต, Val ๋ฐ ๋‚ด๋ณด๋‚ด๊ธฐ ๋ฌธ์„œ ํŽ˜์ด์ง€๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ

from ultralytics import RTDETR

# Load a COCO-pretrained RT-DETR-l model
model = RTDETR("rtdetr-l.pt")

# Display model information (optional)
model.info()

# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)

# Run inference with the RT-DETR-l model on the 'bus.jpg' image
results = model("path/to/bus.jpg")
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640

# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg

์ง€์›๋˜๋Š” ์ž‘์—… ๋ฐ ๋ชจ๋“œ

์ด ํ‘œ์—๋Š” ๋ชจ๋ธ ์œ ํ˜•, ์‚ฌ์ „ ํ•™์Šต๋œ ํŠน์ • ๊ฐ€์ค‘์น˜, ๊ฐ ๋ชจ๋ธ์ด ์ง€์›ํ•˜๋Š” ์ž‘์—… ๋ฐ ์ง€์›๋˜๋Š” ๋‹ค์–‘ํ•œ ๋ชจ๋“œ(ํ•™์Šต, Val, ์˜ˆ์ธก, ๋‚ด๋ณด๋‚ด๊ธฐ)๊ฐ€ โœ… ์ด๋ชจํ‹ฐ์ฝ˜์œผ๋กœ ํ‘œ์‹œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ์œ ํ˜• ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜ ์ง€์›๋˜๋Š” ์ž‘์—… ์ถ”๋ก  ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๊ต์œก ๋‚ด๋ณด๋‚ด๊ธฐ
RT-DETR ๋Œ€ํ˜• rtdetr-l.pt ๋ฌผ์ฒด ๊ฐ์ง€ โœ… โœ… โœ… โœ…
RT-DETR ์ดˆ๋Œ€ํ˜• rtdetr-x.pt ๋ฌผ์ฒด ๊ฐ์ง€ โœ… โœ… โœ… โœ…

์ธ์šฉ ๋ฐ ๊ฐ์‚ฌ

์—ฐ๊ตฌ ๋˜๋Š” ๊ฐœ๋ฐœ ์ž‘์—…์— Baidu์˜ RT-DETR ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์›๋ณธ ๋…ผ๋ฌธ์„ ์ธ์šฉํ•ด ์ฃผ์„ธ์š”:

@misc{lv2023detrs,
      title={DETRs Beat YOLOs on Real-time Object Detection},
      author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
      year={2023},
      eprint={2304.08069},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

We would like to acknowledge Baidu and the PaddlePaddle team for creating and maintaining this valuable resource for the computer vision community. Their contribution to the field with the development of the Vision Transformers-based real-time object detector, RT-DETR, is greatly appreciated.

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

๋ฐ”์ด๋‘์˜ RT-DETR ๋ชจ๋ธ์ด๋ž€ ๋ฌด์—‡์ด๋ฉฐ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋‚˜์š”?

๋ฐ”์ด๋‘์˜ RT-DETR (์‹ค์‹œ๊ฐ„ ๊ฐ์ง€ ํŠธ๋žœ์Šคํฌ๋จธ)๋Š” ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•๋œ ๊ณ ๊ธ‰ ์‹ค์‹œ๊ฐ„ ๋ฌผ์ฒด ๊ฐ์ง€๊ธฐ์ž…๋‹ˆ๋‹ค. ํšจ์œจ์ ์ธ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ธ์ฝ”๋”๋ฅผ ํ†ตํ•ด ์Šค์ผ€์ผ ๋‚ด ์ƒํ˜ธ ์ž‘์šฉ๊ณผ ์Šค์ผ€์ผ ๊ฐ„ ์œตํ•ฉ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๋ฉ€ํ‹ฐ์Šค์ผ€์ผ ๊ธฐ๋Šฅ์„ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ์ด ๋†’์€ ๊ฐ์ฒด์— ์ง‘์ค‘ํ•จ์œผ๋กœ์จ ๊ฐ์ง€ ์ •ํ™•๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค. ์žฌ๊ต์œก ์—†์ด ๋””์ฝ”๋” ๋ ˆ์ด์–ด๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์ถ”๋ก  ์†๋„๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ RT-DETR ๋‹ค์–‘ํ•œ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ๊ฐ์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. RT-DETR ๊ธฐ๋Šฅ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”.

Ultralytics ์—์„œ ์ œ๊ณตํ•˜๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ RT-DETR ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜๋‚˜์š”?

Ultralytics Python API๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‚ฌ์ „ ํ•™์Šต๋œ PaddlePaddle RT-DETR ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, COCO val2017์—์„œ ์‚ฌ์ „ ํ•™์Šต๋œ RT-DETR-l ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  T4 GPU ์—์„œ ๋†’์€ FPS๋ฅผ ๋‹ฌ์„ฑํ•˜๋ ค๋ฉด ๋‹ค์Œ ์˜ˆ์ œ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

์˜ˆ

from ultralytics import RTDETR

# Load a COCO-pretrained RT-DETR-l model
model = RTDETR("rtdetr-l.pt")

# Display model information (optional)
model.info()

# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)

# Run inference with the RT-DETR-l model on the 'bus.jpg' image
results = model("path/to/bus.jpg")
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640

# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg

๋‹ค๋ฅธ ์‹ค์‹œ๊ฐ„ ๋ฌผ์ฒด ๊ฐ์ง€๊ธฐ๊ฐ€ ์•„๋‹Œ Baidu์˜ RT-DETR ์„ ์„ ํƒํ•ด์•ผ ํ•˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ธ๊ฐ€์š”?

๋ฐ”์ด๋‘์˜ RT-DETR ๋Š” ํšจ์œจ์ ์ธ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ธ์ฝ”๋”์™€ IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ์œผ๋กœ ๋†’์€ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ณ„์‚ฐ ๋น„์šฉ์„ ๋Œ€ํญ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ๋Š” ์ ์ด ๋‹๋ณด์ž…๋‹ˆ๋‹ค. ์žฌ๊ต์œก ์—†์ด ๋‹ค์–‘ํ•œ ๋””์ฝ”๋” ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์†๋„๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ ์œ ํ•œ ๊ธฐ๋Šฅ์œผ๋กœ ์ƒ๋‹นํ•œ ์œ ์—ฐ์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ CUDA ์™€ ๊ฐ™์€ ๊ฐ€์†ํ™”๋œ ๋ฐฑ์—”๋“œ์—์„œ ์‹ค์‹œ๊ฐ„ ์„ฑ๋Šฅ์ด ํ•„์š”ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ํŠนํžˆ ์œ ๋ฆฌํ•˜๋ฉฐ, ๋‹ค๋ฅธ ๋งŽ์€ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ํƒ์ง€๊ธฐ๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” TensorRT ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

RT-DETR ์€ ๋‹ค์–‘ํ•œ ์‹ค์‹œ๊ฐ„ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ํ•ฉํ•œ ์ถ”๋ก  ์†๋„๋ฅผ ์–ด๋–ป๊ฒŒ ์ง€์›ํ•˜๋‚˜์š”?

Baidu's RT-DETR allows flexible adjustments of inference speed by using different decoder layers without requiring retraining. This adaptability is crucial for scaling performance across various real-time object detection tasks. Whether you need faster processing for lower precision needs or slower, more accurate detections, RT-DETR can be tailored to meet your specific requirements.

RT-DETR ๋ชจ๋ธ์„ ๊ต์œก, ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๋ฐ ๋‚ด๋ณด๋‚ด๊ธฐ์™€ ๊ฐ™์€ ๋‹ค๋ฅธ Ultralytics ๋ชจ๋“œ์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‚˜์š”?

์˜ˆ, RT-DETR ๋ชจ๋ธ์€ ํ•™์Šต, ๊ฒ€์ฆ, ์˜ˆ์ธก, ๋‚ด๋ณด๋‚ด๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ Ultralytics ๋ชจ๋“œ์™€ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋“œ๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ง€์นจ์€ ๊ฐ ์„ค๋ช…์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”: ํ›ˆ๋ จ, Val, ์˜ˆ์ธก ๋ฐ ๋‚ด๋ณด๋‚ด๊ธฐ. ์ด๋ฅผ ํ†ตํ•ด ๊ฐ์ฒด ๊ฐ์ง€ ์†”๋ฃจ์…˜ ๊ฐœ๋ฐœ ๋ฐ ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ํฌ๊ด„์ ์ธ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“… Created 11 months ago โœ๏ธ Updated 18 days ago

๋Œ“๊ธ€