YOLOv5 vs YOLO11: A Detailed Comparison
This page provides a technical comparison between two significant object detection models developed by Ultralytics: the widely adopted Ultralytics YOLOv5 and the latest state-of-the-art Ultralytics YOLO11. We will analyze their architectures, performance metrics, and ideal use cases to help you select the most suitable model for your computer vision projects, highlighting the strengths and advancements within the Ultralytics ecosystem.
Ultralytics YOLOv5: The Established Industry Standard
Author: Glenn Jocher
Organization: Ultralytics
Date: 2020-06-26
GitHub Link: https://github.com/ultralytics/yolov5
Docs Link: https://docs.ultralytics.com/models/yolov5/
Ultralytics YOLOv5 quickly gained popularity after its release in 2020, becoming an industry standard known for its exceptional balance of speed, accuracy, and ease of use. Built on PyTorch, it offers a streamlined user experience, making it highly accessible for developers and researchers.
Architecture and Key Features
YOLOv5 features a CSPDarknet53 backbone, a PANet neck for feature aggregation, and the YOLOv5 detection head. It utilizes an anchor-based approach for object detection. A key strength is its scalability, offering various model sizes (n, s, m, l, x) to cater to different computational budgets and performance needs, from edge devices to powerful servers. Its architecture is optimized for efficient training and fast inference.
Strengths
- Exceptional Speed: Highly optimized for real-time inference, making it ideal for video processing and applications requiring low latency.
- Ease of Use: Renowned for its simple API, extensive documentation, and straightforward training/deployment process within the Ultralytics ecosystem.
- Mature Ecosystem: Benefits from a large, active community, numerous tutorials, readily available pre-trained weights, and integration with platforms like Ultralytics HUB.
- Training Efficiency: Known for relatively fast training times and efficient use of resources compared to more complex architectures.
Weaknesses
- Accuracy: While strong, its accuracy (mAP) is generally lower than newer models like YOLO11, especially for smaller model variants.
- Anchor-Based: Relies on predefined anchor boxes, which might require tuning for optimal performance on datasets with unusual object aspect ratios.
- Task Focus: Primarily designed for object detection, although segmentation capabilities were added later.
Ideal Use Cases
YOLOv5 excels in scenarios where speed and ease of deployment are critical:
- Real-time video surveillance and analysis.
- Deployment on edge computing devices (Raspberry Pi, NVIDIA Jetson) with limited resources.
- Mobile applications requiring fast, on-device detection.
- Rapid prototyping and development due to its simplicity and extensive support.
Ultralytics YOLO11: The Next Evolution
Authors: Glenn Jocher, Jing Qiu
Organization: Ultralytics
Date: 2024-09-27
GitHub Link: https://github.com/ultralytics/ultralytics
Docs Link: https://docs.ultralytics.com/models/yolo11/
Ultralytics YOLO11 is the latest state-of-the-art model in the YOLO series, released in September 2024. It builds upon the successes of its predecessors, including YOLOv8, introducing architectural refinements focused on boosting both accuracy and efficiency, particularly inference speed on CPUs.
Architecture and Key Features
YOLO11 incorporates advancements over YOLOv5 and YOLOv8, featuring optimized backbone and neck structures (like C2f blocks) for better feature extraction and fusion with potentially fewer parameters. A significant change is its anchor-free detection head, simplifying the detection process and often improving performance. YOLO11 is designed as a versatile framework supporting multiple vision tasks: object detection, instance segmentation, image classification, pose estimation, and oriented bounding boxes (OBB).
Strengths
- Higher Accuracy: Generally achieves superior mAP scores compared to YOLOv5 across comparable model sizes.
- Improved Efficiency: Offers faster inference speeds, especially on CPUs, and often requires fewer parameters and FLOPs than YOLOv5 for similar accuracy levels.
- Versatility: Natively supports a wider range of computer vision tasks beyond detection.
- Anchor-Free: Simplifies deployment and potentially improves generalization by removing the need for anchor box tuning.
- Ultralytics Ecosystem: Benefits from the well-maintained Ultralytics ecosystem, including easy training, validation, deployment tools, and Ultralytics HUB integration.
- Memory Efficiency: Like other YOLO models, it generally requires less memory during training and inference compared to large transformer-based models.
Weaknesses
- Newer Model: As a more recent release, the community support base and third-party integrations, while growing rapidly, might be less extensive than the highly established YOLOv5.
- Resource Needs: Larger YOLO11 variants (l, x) still require significant computational resources, a common trade-off for achieving the highest accuracy.
Ideal Use Cases
YOLO11 is well-suited for applications demanding higher accuracy and efficiency, or requiring multi-task capabilities:
- Advanced robotics and autonomous systems.
- High-performance security and surveillance systems.
- Industrial automation and quality control.
- Applications needing integrated detection, segmentation, or pose estimation.
Performance Comparison: YOLOv5 vs YOLO11
The table below provides a direct comparison of performance metrics for YOLOv5 and YOLO11 models evaluated on the COCO dataset. Metrics include mAPval (mean Average Precision on validation set, IoU 0.50:0.95), inference speed on CPU (ONNX) and GPU (TensorRT), number of parameters, and FLOPs.
Model | size (pixels) |
mAPval 50-95 |
Speed CPU ONNX (ms) |
Speed T4 TensorRT10 (ms) |
params (M) |
FLOPs (B) |
---|---|---|---|---|---|---|
YOLOv5n | 640 | 28.0 | 73.6 | 1.12 | 2.6 | 7.7 |
YOLOv5s | 640 | 37.4 | 120.7 | 1.92 | 9.1 | 24.0 |
YOLOv5m | 640 | 45.4 | 233.9 | 4.03 | 25.1 | 64.2 |
YOLOv5l | 640 | 49.0 | 408.4 | 6.61 | 53.2 | 135.0 |
YOLOv5x | 640 | 50.7 | 763.2 | 11.89 | 97.2 | 246.4 |
YOLO11n | 640 | 39.5 | 56.1 | 1.5 | 2.6 | 6.5 |
YOLO11s | 640 | 47.0 | 90.0 | 2.5 | 9.4 | 21.5 |
YOLO11m | 640 | 51.5 | 183.2 | 4.7 | 20.1 | 68.0 |
YOLO11l | 640 | 53.4 | 238.6 | 6.2 | 25.3 | 86.9 |
YOLO11x | 640 | 54.7 | 462.8 | 11.3 | 56.9 | 194.9 |
Analysis shows YOLO11 consistently outperforms YOLOv5 in mAP across all model sizes. Notably, YOLO11 achieves this higher accuracy with significantly faster CPU inference speeds and generally fewer parameters and FLOPs compared to its YOLOv5 counterparts (e.g., YOLO11m vs YOLOv5m). While YOLOv5n retains a slight edge in GPU TensorRT speed, YOLO11 demonstrates superior overall efficiency and accuracy, representing a significant advancement.
Explore Other Models
Ultralytics offers a diverse range of models. Besides YOLOv5 and YOLO11, consider exploring:
- YOLOv8: A highly successful and versatile predecessor to YOLO11, offering a strong balance of features and performance.
- YOLOv10: Features innovations for post-processing-free inference.
- YOLOv9: Incorporates Programmable Gradient Information (PGI).
- RT-DETR: An alternative transformer-based architecture for object detection.
Choosing between YOLOv5 and YOLO11 depends on specific project needs. YOLOv5 remains a robust, fast, and easy-to-use option, especially for simpler detection tasks or when leveraging its vast ecosystem is paramount. YOLO11 is the recommended choice for new projects requiring higher accuracy, better efficiency (especially on CPU), and multi-task capabilities, benefiting from the latest architectural improvements within the streamlined Ultralytics framework.