Skip to content

RTDETRv2 vs YOLO11: A Technical Comparison for Object Detection

Choosing the right object detection model is crucial for computer vision projects. Ultralytics offers a range of models, including the efficient YOLO series and the high-accuracy RT-DETR series. This page provides a detailed technical comparison between RTDETRv2 and YOLO11, two state-of-the-art models for object detection, to help you make an informed decision.

RTDETRv2: High Accuracy Real-Time Detection

RTDETRv2 (Real-Time Detection Transformer v2) is a cutting-edge object detection model known for its high accuracy and real-time capabilities. Built upon a Vision Transformer (ViT) architecture, RTDETRv2 excels in tasks requiring precise object localization and classification.

Architecture and Key Features

RTDETRv2 employs a transformer-based architecture, enabling it to capture global context within images, leading to improved accuracy, especially in complex scenes. Unlike traditional Convolutional Neural Networks (CNNs), Vision Transformers leverage self-attention mechanisms to weigh the importance of different image regions, enhancing feature extraction. This architecture allows RTDETRv2 to achieve state-of-the-art accuracy while maintaining competitive inference speeds.

Performance Metrics

As indicated in the comparison table below, RTDETRv2 models offer impressive mAP scores, particularly the larger variants like RTDETRv2-x, which achieves a mAPval50-95 of 54.3. Inference speeds on TensorRT are also respectable, making it suitable for real-time applications when deployed on capable hardware like NVIDIA T4 GPUs.

Strengths and Weaknesses

Strengths:

  • High Accuracy: Transformer-based architecture enables superior object detection accuracy.
  • Real-Time Performance: Achieves competitive inference speeds, especially with hardware acceleration.
  • Robust Feature Extraction: Vision Transformers effectively capture global context and intricate details.

Weaknesses:

  • Larger Model Size: Models like RTDETRv2-x have a larger parameter count and FLOPs compared to smaller YOLO models, requiring more computational resources.
  • Inference Speed: While real-time capable, inference speed might be slower than the fastest YOLO models on resource-constrained devices.

Ideal Use Cases

RTDETRv2 is ideally suited for applications where high accuracy is paramount and sufficient computational resources are available. These include:

Learn more about RTDETRv2

YOLO11: Efficient and Versatile Object Detection

YOLO11 (You Only Look Once 11) represents the latest iteration in the renowned Ultralytics YOLO series, known for its speed and efficiency. YOLO11 builds upon previous versions, offering enhanced accuracy and performance while maintaining its real-time edge.

Architecture and Key Features

YOLO11 continues the single-stage detection paradigm, prioritizing inference speed without significantly compromising accuracy. It incorporates architectural improvements and optimizations to achieve a better balance between speed and precision compared to its predecessors like YOLOv8. YOLO models are designed for efficient processing, making them highly suitable for real-time applications across diverse hardware platforms.

Performance Metrics

The performance table highlights YOLO11's strength in speed. Models like YOLO11n and YOLO11s achieve impressive inference times on both CPU and GPU, making them excellent choices for latency-sensitive applications and edge deployments. While slightly lower in mAP compared to the larger RTDETRv2 models, YOLO11 still delivers competitive accuracy for a wide range of object detection tasks.

Strengths and Weaknesses

Strengths:

  • Exceptional Speed: YOLO models are renowned for their fast inference speeds, crucial for real-time applications.
  • Efficiency: YOLO11 models are computationally efficient, allowing deployment on resource-constrained devices.
  • Versatility: Suitable for a broad spectrum of object detection tasks and deployment scenarios.
  • Small Model Size: Smaller YOLO11 variants have significantly fewer parameters, making them memory-efficient.

Weaknesses:

  • Accuracy Trade-off: In scenarios demanding the absolute highest accuracy, particularly with complex or overlapping objects, larger models like RTDETRv2 might offer superior performance.

Ideal Use Cases

YOLO11's speed and efficiency make it ideal for applications with real-time processing requirements and deployments on edge devices. Key use cases include:

Learn more about YOLO11

Model Comparison Table

Model size
(pixels)
mAPval
50-95
Speed
CPU ONNX
(ms)
Speed
T4 TensorRT10
(ms)
params
(M)
FLOPs
(B)
RTDETRv2-s 640 48.1 - 5.03 20 60
RTDETRv2-m 640 51.9 - 7.51 36 100
RTDETRv2-l 640 53.4 - 9.76 42 136
RTDETRv2-x 640 54.3 - 15.03 76 259
YOLO11n 640 39.5 56.1 1.5 2.6 6.5
YOLO11s 640 47.0 90.0 2.5 9.4 21.5
YOLO11m 640 51.5 183.2 4.7 20.1 68.0
YOLO11l 640 53.4 238.6 6.2 25.3 86.9
YOLO11x 640 54.7 462.8 11.3 56.9 194.9

Conclusion

Both RTDETRv2 and YOLO11 are powerful object detection models, each catering to different needs. RTDETRv2 is the preferred choice when top-tier accuracy is the priority and computational resources are available. YOLO11, on the other hand, shines in scenarios demanding real-time performance, efficiency, and deployment on resource-constrained platforms.

For users seeking other options, Ultralytics offers a diverse model zoo, including:

Choosing between RTDETRv2 and YOLO11, or other Ultralytics models, depends on the specific requirements of your computer vision project, balancing accuracy, speed, and resource constraints. Refer to the Ultralytics Documentation and GitHub repository for detailed information and implementation guides.

📅 Created 1 year ago ✏️ Updated 1 month ago

Comments