YOLOv9 vs. EfficientDet: A Detailed Comparison
Choosing the optimal object detection model is critical for computer vision tasks, balancing accuracy, speed, and computational resources. This page provides a detailed technical comparison between Ultralytics YOLOv9 and EfficientDet, two significant models in the object detection landscape. We will delve into their architectural designs, performance benchmarks, and suitable applications to assist you in making an informed decision.
YOLOv9: State-of-the-Art Accuracy and Efficiency
YOLOv9, introduced in 2024 by Chien-Yao Wang and Hong-Yuan Mark Liao from the Institute of Information Science, Academia Sinica, Taiwan, represents a significant advancement in the YOLO series. It is detailed in their paper "YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information" and implemented in their GitHub repository. YOLOv9 addresses the challenge of information loss in deep networks through innovative architectural elements like Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN). These innovations ensure that the model learns effectively and maintains high accuracy with fewer parameters, showcasing a strong balance between performance and efficiency.
Technical Details:
- Authors: Chien-Yao Wang, Hong-Yuan Mark Liao
- Organization: Institute of Information Science, Academia Sinica, Taiwan
- Date: 2024-02-21
- Arxiv Link: https://arxiv.org/abs/2402.13616
- GitHub Link: https://github.com/WongKinYiu/yolov9
- Docs Link: https://docs.ultralytics.com/models/yolov9/
Strengths:
- State-of-the-art Accuracy: YOLOv9 achieves superior accuracy in object detection, often outperforming competitors at similar parameter counts.
- Efficient Parameter Utilization: PGI and GELAN architectures enhance feature extraction and reduce information loss, leading to better performance with fewer parameters and FLOPs.
- Scalability: The YOLOv9 family includes various model sizes (YOLOv9t to YOLOv9e), offering flexibility for different computational capabilities.
- Ultralytics Ecosystem: While the original research is from Academia Sinica, integration within the Ultralytics framework provides benefits like ease of use, extensive documentation, efficient training processes, readily available pre-trained weights, and strong community support. YOLO models typically exhibit lower memory requirements during training compared to transformer-based models.
Weaknesses:
- Inference Speed: While highly efficient, the largest YOLOv9 variants might show slower inference speeds compared to the most lightweight EfficientDet models on certain hardware, though often providing higher accuracy.
- Novelty: As a newer model, real-world deployment examples might be less numerous than for older, established models like EfficientDet, although adoption within the Ultralytics community is rapid.
Use Cases:
YOLOv9 is particularly well-suited for applications where accuracy and efficiency are paramount, such as:
- High-resolution image analysis (using computer vision to analyse satellite imagery).
- Complex scene understanding required in autonomous vehicles.
- Detailed object recognition for tasks like quality control in manufacturing.
EfficientDet: Scalable and Efficient Object Detection
EfficientDet, developed by the Google Brain team (Mingxing Tan, Ruoming Pang, Quoc V. Le) in 2019, focuses on achieving high efficiency and accuracy through architectural innovations like the BiFPN (Bi-directional Feature Pyramid Network) and compound scaling. The model details are available in their paper "EfficientDet: Scalable and Efficient Object Detection" and the official implementation is hosted on GitHub.
Technical Details:
- Authors: Mingxing Tan, Ruoming Pang, Quoc V. Le
- Organization: Google
- Date: 2019-11-20
- Arxiv Link: https://arxiv.org/abs/1911.09070
- GitHub Link: https://github.com/google/automl/tree/master/efficientdet
- Docs Link: https://github.com/google/automl/tree/master/efficientdet#readme
Strengths:
- Scalability: Offers a wide range of models (D0-D7) scaled efficiently using a compound coefficient, allowing adaptation to various resource constraints.
- Efficiency: BiFPN allows for effective multi-scale feature fusion with fewer parameters compared to traditional FPNs.
Weaknesses:
- Accuracy/Speed Trade-off: While efficient, EfficientDet models can be outperformed in accuracy by comparable YOLOv9 variants (see table below). Larger EfficientDet models show significantly slower inference speeds on GPUs compared to YOLOv9.
- Complexity: The compound scaling and BiFPN architecture, while effective, might be less straightforward to modify or understand compared to the more modular YOLO architectures.
- Ecosystem: Lacks the integrated ecosystem, extensive tooling, and active maintenance provided by Ultralytics for YOLO models.
Use Cases:
EfficientDet is suitable for applications where a balance between accuracy and computational resources is needed, particularly when deploying across a range of hardware capabilities.
- Mobile and edge applications where model size is a constraint (though smaller YOLOv9 models like YOLOv9t offer strong competition).
- General-purpose object detection tasks.
Performance Comparison: YOLOv9 vs. EfficientDet
The table below compares various YOLOv9 and EfficientDet models based on performance metrics on the COCO val dataset.
Model | size (pixels) |
mAPval 50-95 |
Speed CPU ONNX (ms) |
Speed T4 TensorRT10 (ms) |
params (M) |
FLOPs (B) |
---|---|---|---|---|---|---|
YOLOv9t | 640 | 38.3 | - | 2.3 | 2.0 | 7.7 |
YOLOv9s | 640 | 46.8 | - | 3.54 | 7.1 | 26.4 |
YOLOv9m | 640 | 51.4 | - | 6.43 | 20.0 | 76.3 |
YOLOv9c | 640 | 53.0 | - | 7.16 | 25.3 | 102.1 |
YOLOv9e | 640 | 55.6 | - | 16.77 | 57.3 | 189.0 |
EfficientDet-d0 | 640 | 34.6 | 10.2 | 3.92 | 3.9 | 2.54 |
EfficientDet-d1 | 640 | 40.5 | 13.5 | 7.31 | 6.6 | 6.1 |
EfficientDet-d2 | 640 | 43.0 | 17.7 | 10.92 | 8.1 | 11.0 |
EfficientDet-d3 | 640 | 47.5 | 28.0 | 19.59 | 12.0 | 24.9 |
EfficientDet-d4 | 640 | 49.7 | 42.8 | 33.55 | 20.7 | 55.2 |
EfficientDet-d5 | 640 | 51.5 | 72.5 | 67.86 | 33.7 | 130.0 |
EfficientDet-d6 | 640 | 52.6 | 92.8 | 89.29 | 51.9 | 226.0 |
EfficientDet-d7 | 640 | 53.7 | 122.0 | 128.07 | 51.9 | 325.0 |
Analysis: YOLOv9 models consistently demonstrate superior mAP compared to EfficientDet models with similar or even significantly higher parameter counts and FLOPs. For instance, YOLOv9c achieves 53.0 mAP with 25.3M parameters, surpassing EfficientDet-d6 (52.6 mAP, 51.9M parameters). Furthermore, YOLOv9 models exhibit significantly faster inference speeds on NVIDIA T4 GPUs using TensorRT, highlighting their optimization for real-time performance. YOLOv9e reaches the highest mAP (55.6) with considerably faster inference than EfficientDet-d7. Even the smallest YOLOv9t model offers competitive accuracy (38.3 mAP) with extremely low parameters (2.0M) and fast inference (2.3ms).
Conclusion
While EfficientDet was a significant step forward in efficient object detection upon its release, YOLOv9 represents the current state-of-the-art, offering superior accuracy and often better speed, particularly on GPU hardware. YOLOv9's innovative PGI and GELAN architectures provide a more effective balance of performance and computational cost.
For developers and researchers seeking the best combination of accuracy, speed, and ease of use, YOLOv9 is the recommended choice. Its integration within the Ultralytics ecosystem further enhances its appeal, providing streamlined workflows for training, validation, deployment, and robust community support.
For those interested in exploring other cutting-edge models, consider checking out comparisons involving YOLOv8, YOLOv10, or transformer-based models like RT-DETR.