YOLO11 vs EfficientDet: A Detailed Technical Comparison
This page offers a detailed technical comparison between Ultralytics YOLO11 and EfficientDet, two prominent object detection models. We analyze their architectures, performance benchmarks, and suitability for different applications to assist you in selecting the optimal model for your computer vision needs. While both models aim for efficient and accurate object detection, they stem from different research lines (Ultralytics and Google) and employ distinct architectural philosophies.
Ultralytics YOLO11
Ultralytics YOLO11 represents the latest advancement in the YOLO (You Only Look Once) series, developed by Ultralytics and known for its exceptional real-time object detection capabilities. It builds upon the success of predecessors like YOLOv8, focusing on enhancing both accuracy and computational efficiency.
Technical Details:
- Authors: Glenn Jocher, Jing Qiu
- Organization: Ultralytics
- Date: 2024-09-27
- GitHub: https://github.com/ultralytics/ultralytics
- Docs: https://docs.ultralytics.com/models/yolo11/
Architecture and Key Features: YOLO11 utilizes a single-stage, anchor-free architecture optimized for speed and accuracy. Key features include refined feature extraction layers and a streamlined network structure, reducing parameter count and computational load. This design ensures excellent performance across diverse hardware, from edge devices (NVIDIA Jetson, Raspberry Pi) to cloud servers.
A major advantage of YOLO11 is its versatility and integration within the Ultralytics ecosystem. It supports multiple tasks beyond object detection, including instance segmentation, image classification, pose estimation, and oriented bounding boxes (OBB). The Ultralytics framework offers a simple Python API and CLI, extensive documentation, readily available pre-trained weights, and efficient training processes with lower memory requirements compared to many other architectures. The ecosystem benefits from active development, strong community support, and seamless integration with tools like Ultralytics HUB for streamlined MLOps.
Performance Metrics: YOLO11 offers models ranging from nano (n) to extra-large (x), providing a flexible trade-off between speed and accuracy. As seen in the table below, YOLO11 models achieve competitive mAP scores while maintaining high inference speeds, particularly on GPUs. For instance, YOLO11m reaches 51.5 mAPval50-95 with efficient parameter usage.
Strengths:
- High Speed and Efficiency: Excellent inference speed, ideal for real-time applications.
- Strong Accuracy: Achieves state-of-the-art mAP scores across model sizes.
- Versatility: Supports detection, segmentation, classification, pose, and OBB tasks.
- Ease of Use: Simple API, comprehensive documentation, and user-friendly ecosystem.
- Well-Maintained Ecosystem: Actively developed, strong community, frequent updates, and tools like Ultralytics HUB.
- Training Efficiency: Faster training times and lower memory usage compared to many alternatives.
- Deployment Flexibility: Optimized for diverse hardware from edge to cloud.
Weaknesses:
- Smaller models prioritize speed, which may involve a trade-off in maximum achievable accuracy compared to the largest variants.
- As a one-stage detector, may face challenges with extremely small objects in certain complex scenes.
Ideal Use Cases: YOLO11 excels in applications demanding real-time performance and high accuracy:
- Autonomous Systems: Robotics and self-driving cars.
- Security: Surveillance systems and theft prevention.
- Industrial Automation: Quality control and recycling efficiency.
- Retail: Inventory management and customer analytics.
EfficientDet
EfficientDet, developed by the Google Brain team, is a family of object detection models designed for high accuracy and efficiency. It leverages architectural innovations derived from the EfficientNet image classification models.
Technical Details:
- Authors: Mingxing Tan, Ruoming Pang, Quoc V. Le
- Organization: Google
- Date: 2019-11-20
- Arxiv: https://arxiv.org/abs/1911.09070
- GitHub: https://github.com/google/automl/tree/master/efficientdet
- Docs: https://github.com/google/automl/tree/master/efficientdet#readme
Architecture and Key Features: EfficientDet's architecture is characterized by:
- EfficientNet Backbone: Uses the highly efficient EfficientNet as its base feature extractor.
- BiFPN (Bi-directional Feature Pyramid Network): Employs a weighted bi-directional FPN for effective multi-scale feature fusion.
- Compound Scaling: Uses a compound coefficient to uniformly scale the depth, width, and resolution of the backbone, feature network, and detection head.
Performance Metrics: EfficientDet models (D0-D7) provide a range of accuracy and efficiency levels. As shown in the table, EfficientDet achieves good mAP scores, particularly with larger models, but often at the cost of higher latency compared to YOLO11 variants with similar accuracy.
Strengths:
- High Accuracy: Achieves strong mAP scores, especially the larger models.
- Scalable Architecture: Compound scaling allows for systematic adjustment of model size and performance.
- Efficient Feature Fusion: BiFPN enhances multi-scale feature representation.
Weaknesses:
- Task Specificity: Primarily focused on object detection, lacking the multi-task versatility of YOLO11.
- Ecosystem: Less integrated ecosystem compared to Ultralytics, potentially requiring more effort for training, deployment, and support.
- Inference Speed: Can exhibit higher latency compared to similarly accurate YOLO11 models, especially on GPUs.
Ideal Use Cases: EfficientDet is suitable for applications where achieving high accuracy within specific computational budgets is crucial, such as:
- Cloud-based Vision Services: Where latency might be less critical than maximizing accuracy per FLOP.
- Research: Exploring scalable architectures and feature fusion techniques.
Performance Comparison
The table below provides a quantitative comparison between YOLO11 and EfficientDet variants on the COCO dataset. Note that YOLO11 generally offers significantly faster GPU inference speeds (T4 TensorRT10) for comparable mAP levels and often uses fewer parameters.
Model | size (pixels) |
mAPval 50-95 |
Speed CPU ONNX (ms) |
Speed T4 TensorRT10 (ms) |
params (M) |
FLOPs (B) |
---|---|---|---|---|---|---|
YOLO11n | 640 | 39.5 | 56.1 | 1.5 | 2.6 | 6.5 |
YOLO11s | 640 | 47.0 | 90.0 | 2.5 | 9.4 | 21.5 |
YOLO11m | 640 | 51.5 | 183.2 | 4.7 | 20.1 | 68.0 |
YOLO11l | 640 | 53.4 | 238.6 | 6.2 | 25.3 | 86.9 |
YOLO11x | 640 | 54.7 | 462.8 | 11.3 | 56.9 | 194.9 |
EfficientDet-d0 | 640 | 34.6 | 10.2 | 3.92 | 3.9 | 2.54 |
EfficientDet-d1 | 640 | 40.5 | 13.5 | 7.31 | 6.6 | 6.1 |
EfficientDet-d2 | 640 | 43.0 | 17.7 | 10.92 | 8.1 | 11.0 |
EfficientDet-d3 | 640 | 47.5 | 28.0 | 19.59 | 12.0 | 24.9 |
EfficientDet-d4 | 640 | 49.7 | 42.8 | 33.55 | 20.7 | 55.2 |
EfficientDet-d5 | 640 | 51.5 | 72.5 | 67.86 | 33.7 | 130.0 |
EfficientDet-d6 | 640 | 52.6 | 92.8 | 89.29 | 51.9 | 226.0 |
EfficientDet-d7 | 640 | 53.7 | 122.0 | 128.07 | 51.9 | 325.0 |
Conclusion
Both YOLO11 and EfficientDet are powerful object detection models, but Ultralytics YOLO11 offers significant advantages for most developers and researchers. Its superior speed, especially on GPUs, combined with competitive accuracy, makes it ideal for real-time applications. Furthermore, YOLO11's versatility across multiple vision tasks (detection, segmentation, classification, pose, OBB), ease of use, efficient training, lower memory footprint, and the robust Ultralytics ecosystem provide a more streamlined and productive development experience. While EfficientDet offers strong accuracy and scalability, YOLO11 generally presents a better overall package for diverse deployment scenarios and is the recommended choice.
For further exploration, consider comparing these models with others like YOLOv10, YOLOv9, RT-DETR, and DAMO-YOLO. You can find more comparisons in the Ultralytics documentation.