跳至内容

Baidu's RT-DETR: A Vision Transformer-Based Real-Time Object Detector

概述

Real-Time Detection Transformer (RT-DETR), developed by Baidu, is a cutting-edge end-to-end object detector that provides real-time performance while maintaining high accuracy. It is based on the idea of DETR (the NMS-free framework), meanwhile introducing conv-based backbone and an efficient hybrid encoder to gain real-time speed. RT-DETR efficiently processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. The model is highly adaptable, supporting flexible adjustment of inference speed using different decoder layers without retraining. RT-DETR excels on accelerated backends like CUDA with TensorRT, outperforming many other real-time object detectors.



观看: 实时检测变压器 (RT-DETR)

模型示例图片 百度概况RT-DETR 。 RT-DETR 模型架构图显示了作为编码器输入的主干{S3、S4、S5}的最后三个阶段。高效混合编码器通过级内特征交互(AIFI)和跨尺度特征融合模块(CCFM)将多尺度特征转换为图像特征序列。采用 IoU 感知查询选择,选择固定数量的图像特征作为解码器的初始对象查询。最后,带有辅助预测头的解码器会对对象查询进行迭代优化,以生成箱和置信度分数(Box and confidence scores)。消息来源).

主要功能

  • Efficient Hybrid Encoder: Baidu's RT-DETR uses an efficient hybrid encoder that processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time object detection.
  • IoU 感知查询选择:百度的RT-DETR 利用 IoU 感知查询选择改进了对象查询初始化。这样,模型就能专注于场景中最相关的物体,从而提高检测精度。
  • 可调整的推理速度:百度RT-DETR 支持通过使用不同的解码器层灵活调整推理速度,而无需重新训练。这种适应性有助于在各种实时物体检测场景中的实际应用。

预训练模型

Ultralytics Python API 提供不同规模的预训练PaddlePaddle RT-DETR 模型:

  • RT-DETR-L:COCO Val2017 的 AP 为 53.0%,T4 为 114 FPSGPU
  • RT-DETR-X:COCO Val2017 的 AP 为 54.8%,T4 为 74 FPSGPU

使用示例

本示例提供了简单的RT-DETR 训练和推理示例。有关这些模式和其他模式的完整文档,请参阅 "预测"、"训练"、"验证"和 "导出"文档页面。

示例

from ultralytics import RTDETR

# Load a COCO-pretrained RT-DETR-l model
model = RTDETR("rtdetr-l.pt")

# Display model information (optional)
model.info()

# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)

# Run inference with the RT-DETR-l model on the 'bus.jpg' image
results = model("path/to/bus.jpg")
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640

# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg

支持的任务和模式

本表介绍了模型类型、特定的预训练权重、每个模型支持的任务以及支持的各种模式(训练验证预测导出),并用✅表情符号表示。

型号预训练重量支持的任务推论验证培训出口
RT-DETR 大型rtdetr-l.pt物体检测
RT-DETR 特大号rtdetr-x.pt物体检测

引文和致谢

如果您在研究或开发工作中使用了百度的RT-DETR ,请引用原始论文

@misc{lv2023detrs,
      title={DETRs Beat YOLOs on Real-time Object Detection},
      author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
      year={2023},
      eprint={2304.08069},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

We would like to acknowledge Baidu and the PaddlePaddle team for creating and maintaining this valuable resource for the computer vision community. Their contribution to the field with the development of the Vision Transformers-based real-time object detector, RT-DETR, is greatly appreciated.

常见问题

百度的RT-DETR 模式是什么,如何运作?

百度的RT-DETR (Real-Time Detection Transformer,实时检测转换器)是一种先进的实时物体检测器,它建立在视觉转换器(Vision Transformer)架构之上。它通过高效的混合编码器解耦尺度内交互和跨尺度融合,从而高效处理多尺度特征。通过采用 IoU 感知查询选择技术,该模型可专注于最相关的物体,从而提高检测精度。通过调整解码器层,无需重新训练即可实现可适应的推理速度,这使得RT-DETR 适用于各种实时对象检测场景。点击此处了解有关RT-DETR 功能的更多信息。

如何使用Ultralytics 提供的预训练RT-DETR 模型?

您可以利用Ultralytics Python API 使用预训练的PaddlePaddle RT-DETR 模型。例如,要加载在 COCO val2017 上预先训练好的RT-DETR-l 模型,并在 T4GPU 上实现高 FPS,可以利用下面的示例:

示例

from ultralytics import RTDETR

# Load a COCO-pretrained RT-DETR-l model
model = RTDETR("rtdetr-l.pt")

# Display model information (optional)
model.info()

# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)

# Run inference with the RT-DETR-l model on the 'bus.jpg' image
results = model("path/to/bus.jpg")
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640

# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg

为什么要选择百度的RT-DETR 而不是其他实时物体检测器?

百度的RT-DETR 因其高效的混合编码器和 IoU 感知查询选择而脱颖而出,在保持高准确度的同时大幅降低了计算成本。通过使用不同的解码器层调整推理速度,无需重新训练,这种独特的能力大大增加了灵活性。这使得它对于需要在加速后端(如CUDA 和TensorRT )上实现实时性能的应用特别有优势,胜过许多其他实时对象检测器。

RT-DETR 如何支持针对不同实时应用的自适应推理速度?

Baidu's RT-DETR allows flexible adjustments of inference speed by using different decoder layers without requiring retraining. This adaptability is crucial for scaling performance across various real-time object detection tasks. Whether you need faster processing for lower precision needs or slower, more accurate detections, RT-DETR can be tailored to meet your specific requirements.

我能否将RT-DETR 模型与其他Ultralytics 模式一起使用,如训练、验证和导出?

是的,RT-DETR 模型与各种Ultralytics 模式兼容,包括训练、验证、预测和导出。有关如何使用这些模式的详细说明,请参阅相关文档:训练验证预测导出。这确保了开发和部署对象检测解决方案的全面工作流程。

📅 Created 11 months ago ✏️ Updated 28 days ago

评论