YOLOv7 vs YOLOv10: A Detailed Technical Comparison
Selecting the right object detection model involves balancing accuracy, speed, and deployment requirements. This page provides a detailed technical comparison between YOLOv7 and YOLOv10, two significant models in the real-time object detection landscape. We will delve into their architectural differences, performance metrics, and ideal use cases to help you choose the best fit for your computer vision projects.
YOLOv7
YOLOv7 was introduced in July 2022 by researchers from the Institute of Information Science, Academia Sinica, Taiwan. It quickly gained recognition for its impressive balance between speed and accuracy, setting new state-of-the-art benchmarks at the time. YOLOv7 focused on optimizing the training process using "bag-of-freebies" techniques to enhance accuracy without increasing inference costs.
Technical Details:
- Authors: Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao
- Organization: Institute of Information Science, Academia Sinica, Taiwan
- Date: 2022-07-06
- Arxiv Link: https://arxiv.org/abs/2207.02696
- GitHub Link: https://github.com/WongKinYiu/yolov7
- Docs Link: https://docs.ultralytics.com/models/yolov7/
Architecture and Key Features
YOLOv7 introduced several architectural improvements and training refinements:
- Extended Efficient Layer Aggregation Networks (E-ELAN): This key component enhances the network's ability to learn features while controlling the gradient path, improving convergence and accuracy.
- Model Scaling: Implemented compound scaling methods for concatenation-based models, allowing effective adjustment of model depth and width for different computational budgets.
- Trainable Bag-of-Freebies: Leveraged techniques like label assignment strategies and batch normalization adjustments during training to boost performance without adding inference overhead.
- Auxiliary Head Coarse-to-fine: Used auxiliary heads during training to improve deep supervision and model learning.
Strengths
- High Accuracy and Speed Balance: YOLOv7 offers a strong combination of high mAP and fast inference speed, suitable for many real-time applications.
- Efficient Training: Incorporates advanced training techniques that improve performance without significantly increasing computational demands during inference.
- Well-Established: As a slightly older model, it has a larger user base and more community resources compared to the newest models.
Weaknesses
- Complexity: The architecture and training strategies, while effective, can be complex to understand and fine-tune.
- NMS Dependency: Relies on Non-Maximum Suppression (NMS) during post-processing, which adds latency.
Use Cases
YOLOv7 is well-suited for:
- Real-time Surveillance: High accuracy for identifying objects or threats in security systems.
- Autonomous Systems: Robust detection for applications like self-driving cars where a balance of speed and accuracy is crucial.
- Industrial Automation: Reliable detection for quality control in manufacturing.
YOLOv10
YOLOv10, introduced in May 2024 by researchers from Tsinghua University, represents a significant step towards end-to-end real-time object detection. Its primary innovation is eliminating the need for NMS during inference by incorporating NMS-free training, alongside architectural optimizations for enhanced efficiency. YOLOv10 is integrated within the Ultralytics framework, benefiting from its streamlined ecosystem.
Technical Details:
- Authors: Ao Wang, Hui Chen, Lihao Liu, et al.
- Organization: Tsinghua University
- Date: 2024-05-23
- Arxiv Link: https://arxiv.org/abs/2405.14458
- GitHub Link: https://github.com/THU-MIG/yolov10
- Docs Link: https://docs.ultralytics.com/models/yolov10/
Architecture and Key Features
YOLOv10 focuses on optimizing both performance and efficiency:
- NMS-Free Training: Employs consistent dual assignments during training, enabling the model to perform detection end-to-end without requiring NMS post-processing, thus reducing latency.
- Holistic Efficiency-Accuracy Driven Design: Optimizes various model components (backbone, neck, head) comprehensively to reduce computational redundancy and enhance capability.
- Lightweight Architecture: Features optimizations like spatial-channel decoupled downsampling and rank-guided block design for improved parameter efficiency and faster inference, especially beneficial for edge deployment.
Strengths
- Exceptional Speed and Efficiency: Achieves lower latency and higher throughput due to NMS-free design and architectural optimizations.
- Competitive Accuracy: Maintains high accuracy while significantly improving speed and reducing model size.
- End-to-End Deployment: Simplified deployment pipeline thanks to the elimination of NMS.
- Ultralytics Ecosystem: Benefits from integration with the Ultralytics Python package, offering ease of use, efficient training, and access to tools like Ultralytics HUB for streamlined MLOps.
Weaknesses
- Relatively New: As a newer model, community support and real-world deployment examples might be less extensive than for YOLOv7, although adoption within the Ultralytics ecosystem is rapid.
- Small Object Detection: While generally strong, achieving optimal performance for very small objects might require careful tuning, a common challenge for one-stage detectors.
Use Cases
YOLOv10 excels in applications demanding maximum speed and efficiency:
- Edge AI: Ideal for deployment on resource-constrained devices like Raspberry Pi or NVIDIA Jetson.
- High-Throughput Systems: Suitable for applications requiring rapid processing of video streams, such as traffic monitoring.
- Mobile Applications: Enabling real-time object detection on smartphones and other mobile platforms.
Performance Comparison
The table below compares the performance of various YOLOv7 and YOLOv10 model variants on the COCO dataset. YOLOv10 models generally show significantly lower latency (faster inference) for comparable or even better mAP scores, especially the smaller variants. YOLOv10x achieves the highest mAP among the listed models.
Model | size (pixels) |
mAPval 50-95 |
Speed CPU ONNX (ms) |
Speed T4 TensorRT10 (ms) |
params (M) |
FLOPs (B) |
---|---|---|---|---|---|---|
YOLOv7l | 640 | 51.4 | - | 6.84 | 36.9 | 104.7 |
YOLOv7x | 640 | 53.1 | - | 11.57 | 71.3 | 189.9 |
YOLOv10n | 640 | 39.5 | - | 1.56 | 2.3 | 6.7 |
YOLOv10s | 640 | 46.7 | - | 2.66 | 7.2 | 21.6 |
YOLOv10m | 640 | 51.3 | - | 5.48 | 15.4 | 59.1 |
YOLOv10b | 640 | 52.7 | - | 6.54 | 24.4 | 92.0 |
YOLOv10l | 640 | 53.3 | - | 8.33 | 29.5 | 120.3 |
YOLOv10x | 640 | 54.4 | - | 12.2 | 56.9 | 160.4 |
Conclusion
Both YOLOv7 and YOLOv10 are powerful object detection models. YOLOv7 offers a proven balance of speed and accuracy, backed by a more established presence. YOLOv10, integrated into the Ultralytics ecosystem, pushes the boundaries of efficiency with its NMS-free design, achieving state-of-the-art speed and competitive accuracy, making it ideal for latency-critical applications and edge deployments. The choice depends on specific project needs: prioritize maximum speed and end-to-end deployment with YOLOv10, or leverage the established performance of YOLOv7.
For users seeking the latest advancements and a versatile platform supporting multiple tasks like segmentation and pose estimation, exploring other Ultralytics models like Ultralytics YOLOv8 or the cutting-edge YOLO11 is also recommended.