Link to this sectionYOLOv5 中的模型剪枝与稀疏化#
📚 本指南将介绍如何对 YOLOv5 🚀 模型应用剪枝,在保持性能的同时构建更高效的网络。
Link to this section什么是模型剪枝?#
模型剪枝 是一种通过移除不重要的参数(权重和连接)来减小神经网络规模和复杂度的技术。此过程构建的模型更高效,并具有以下优点:
- 减小模型体积,便于在资源受限的设备上部署
- 在对准确率影响极小的情况下提升推理速度
- 降低内存占用和能耗
- 提升实时应用的整体效率
剪枝的工作原理是识别并移除对模型性能贡献极小的参数,从而生成一个准确率相近但更轻量化的模型。
Link to this section开始之前#
Clone repo and install requirements.txt in a Python>=3.8.0 environment, including PyTorch>=1.8. Models and datasets download automatically from the latest YOLOv5 release.
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # installLink to this section测试基准性能#
在剪枝之前,先确定一个基准性能以进行对比。此命令用于测试图像尺寸为 640 像素的 COCO val2017 数据集上的 YOLOv5x。yolov5x.pt 是目前可用且最准确的模型。其他选择包括 yolov5s.pt、yolov5m.pt 和 yolov5l.pt,或者你自己训练自定义数据集得到的检查点 ./weights/best.pt。有关所有可用模型的详情,请参阅 README 表格。
python val.py --weights yolov5x.pt --data coco.yaml --img 640 --half输出:
val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Fusing layers...
Model Summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 157/157 [01:12<00:00, 2.16it/s]
all 5000 36335 0.732 0.628 0.683 0.496
Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640) # <--- base speed
Evaluating pycocotools mAP... saving runs/val/exp-2/yolov5x_predictions.json...
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.507 # <--- base mAP
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.689
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.552
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.345
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.652
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.381
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.630
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.682
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.526
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.731
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.829
Results saved to runs/val/expLink to this section对 YOLOv5x 应用剪枝(30% 稀疏度)#
We can apply pruning to the model using the torch_utils.prune() command defined in utils/torch_utils.py. To test a pruned model, we update val.py to prune YOLOv5x to 0.3 sparsity (30% of weights set to zero):
30% 剪枝后的输出:
val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Fusing layers...
Model Summary: 444 layers, 86705005 parameters, 0 gradients
Pruning model... 0.3 global sparsity
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 157/157 [01:11<00:00, 2.19it/s]
all 5000 36335 0.724 0.614 0.671 0.478
Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640) # <--- prune speed
Evaluating pycocotools mAP... saving runs/val/exp-3/yolov5x_predictions.json...
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.489 # <--- prune mAP
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.677
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.537
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.334
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.370
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.612
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.664
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.496
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.722
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.803
Results saved to runs/val/exp-3Link to this section结果分析#
从结果中我们可以观察到:
- 实现 30% 稀疏度:现在
nn.Conv2d层中 30% 的模型权重参数为零 - 推理时间保持不变:尽管进行了剪枝,处理速度基本相同
- 对性能影响极小:mAP 从 0.507 轻微下降至 0.489(仅降低了 3.6%)
- 模型体积减小:剪枝后的模型所需的存储内存更少
这表明剪枝可以在仅对性能产生微小影响的情况下显著降低模型复杂度,使其成为一种在资源受限环境中部署的有效优化技术。
Link to this section微调剪枝后的模型#
为了获得最佳结果,剪枝后的模型应在剪枝后进行微调以恢复准确率。可以通过以下步骤完成:
- 以期望的稀疏度级别应用剪枝
- 使用较低的学习率对剪枝后的模型进行几个 epoch 的训练
- 将微调后的剪枝模型与基准进行对比评估
这一过程有助于剩余参数进行自适应调整以弥补被移除的连接,通常能恢复大部分或全部的原始准确率。
Link to this section支持的环境#
Ultralytics 提供了一系列即用型环境,预装了 CUDA、CUDNN、Python 和 PyTorch 等关键依赖,助你快速启动项目。
- 免费 GPU 笔记本:
- Google Cloud: GCP 快速入门指南
- Amazon: AWS 快速入门指南
- Azure: AzureML 快速入门指南
- Docker: Docker 快速入门指南
Link to this section项目状态#
此徽章表示所有 YOLOv5 GitHub Actions 持续集成 (CI) 测试均已成功通过。这些 CI 测试严格检查 YOLOv5 在 训练、验证、推理、导出 和 基准测试 等各个关键方面的功能和性能。它们确保了在 macOS、Windows 和 Ubuntu 上的稳定可靠运行,测试每 24 小时进行一次,并会在每次新提交代码时自动触发。