Ultralytics YOLOv5 建筑学

YOLOv5 (v6.0/6.1) is a powerful object detection algorithm developed by Ultralytics. This article dives deep into the YOLOv5 architecture, data augmentation strategies, training methodologies, and loss computation techniques. This comprehensive understanding will help improve your practical application of object detection in various fields, including surveillance, autonomous vehicles, and image recognition.

1.模型结构

YOLOv5其架构由三个主要部分组成：

骨干网:这是网络的主体。对于YOLOv5 ，骨干网的设计使用了 New CSP-Darknet53 结构，这是对以前版本中使用的暗网结构的修改。
颈部:这部分连接着脊柱和头部。在YOLOv5 中、 SPPF 和 New CSP-PAN 结构。
负责人:该部分负责生成最终输出。YOLOv5 使用 YOLOv3 Head 为此目的。

模型结构如下图所示。模型结构详情请见 yolov5l.yaml.

yolov5

YOLOv5 与之前的版本相比，该版本有一些细微的变化：

"(《世界人权宣言》) Focus 结构取代了早期版本中的 6x6 Conv2d 结构。这一变化提高了效率 #4825.
"(《世界人权宣言》) SPP 结构改为 SPPF.这一改动使处理速度提高了一倍多。

测试 SPP 和 SPPF可以使用以下代码：

SPP 与 SPPF 速度剖析示例（点击打开）

import time

import torch
import torch.nn as nn


class SPP(nn.Module):
    def __init__(self):
        """Initializes an SPP module with three different sizes of max pooling layers."""
        super().__init__()
        self.maxpool1 = nn.MaxPool2d(5, 1, padding=2)
        self.maxpool2 = nn.MaxPool2d(9, 1, padding=4)
        self.maxpool3 = nn.MaxPool2d(13, 1, padding=6)

    def forward(self, x):
        """Applies three max pooling layers on input `x` and concatenates results along channel dimension."""
        o1 = self.maxpool1(x)
        o2 = self.maxpool2(x)
        o3 = self.maxpool3(x)
        return torch.cat([x, o1, o2, o3], dim=1)


class SPPF(nn.Module):
    def __init__(self):
        """Initializes an SPPF module with a specific configuration of MaxPool2d layer."""
        super().__init__()
        self.maxpool = nn.MaxPool2d(5, 1, padding=2)

    def forward(self, x):
        """Applies sequential max pooling and concatenates results with input tensor."""
        o1 = self.maxpool(x)
        o2 = self.maxpool(o1)
        o3 = self.maxpool(o2)
        return torch.cat([x, o1, o2, o3], dim=1)


def main():
    """Compares outputs and performance of SPP and SPPF on a random tensor (8, 32, 16, 16)."""
    input_tensor = torch.rand(8, 32, 16, 16)
    spp = SPP()
    sppf = SPPF()
    output1 = spp(input_tensor)
    output2 = sppf(input_tensor)

    print(torch.equal(output1, output2))

    t_start = time.time()
    for _ in range(100):
        spp(input_tensor)
    print(f"SPP time: {time.time() - t_start}")

    t_start = time.time()
    for _ in range(100):
        sppf(input_tensor)
    print(f"SPPF time: {time.time() - t_start}")


if __name__ == "__main__":
    main()

结果

True
SPP time: 0.5373051166534424
SPPF time: 0.20780706405639648

2.数据扩充技术

YOLOv5 employs various data augmentation techniques to improve the model's ability to generalize and reduce overfitting. These techniques include:

Mosaic Augmentation: An image processing technique that combines four training images into one in ways that encourage object detection models to better handle various object scales and translations.
复制-粘贴增强：这是一种创新的数据增强方法，它能从图像中复制随机片段，并将其粘贴到另一张随机选择的图像上，从而有效生成新的训练样本。
随机仿射变换：包括图像的随机旋转、缩放、平移和剪切。
混合增强：一种通过对两幅图像及其相关标签进行线性组合来创建合成图像的方法。
Albumentations：一个强大的图像增强库，支持多种增强技术。
HSV 增强：随机改变图像的色调、饱和度和值。
随机水平翻转一种水平随机翻转图像的增强方法。

3.培训策略

YOLOv5 该模型采用了几种复杂的训练策略来提高模型的性能。这些策略包括

多尺度训练：在训练过程中，输入图像会在 0.5 到 1.5 倍的范围内随机重新缩放。
自动锚点：该策略可优化先验锚点框，使其与自定义数据中地面实况框的统计特征相匹配。
Warmup and Cosine LR Scheduler: A method to adjust the learning rate to enhance model performance.
指数移动平均法 (EMA)：一种使用过去各步参数平均值来稳定训练过程并减少泛化误差的策略。
Mixed Precision Training: A method to perform operations in half-precision format, reducing memory usage and enhancing computational speed.
超参数进化：一种自动调整超参数以达到最佳性能的策略。

4.附加功能

4.1 计算损失

YOLOv5 中的损失是由三个单独的损失部分组合而成的：

分类损失（BCE Loss）：二元交叉熵损失，用于测量分类任务的误差。
对象损失（BCE 损失）：另一种二元交叉熵损失，计算检测特定网格单元中是否存在物体时的误差。
定位损失（CIoU 损失）：完全 IoU 损失，测量网格单元内物体定位的误差。

The overall loss function is depicted by:

$损失$

4.2 余额损失

三个预测层的对象性损失 (P3, P4, P5)的权重不同。天平权重为 [4.0, 1.0, 0.4] 分别。这种方法可确保不同尺度的预测结果对总损失做出适当的贡献。

$obj_loss$

4.3 消除电网敏感性

与YOLO 的早期版本相比，YOLOv5 架构对方框预测策略做出了一些重要改变。在 YOLOv2 和 YOLOv3 中，方框坐标是通过最后一层的激活直接预测的。

$b_x$ $b_y$ $b_w$ $b_h$

YOLOv5 网格计算

不过，在YOLOv5 中，对预测方框坐标的公式进行了更新，以降低网格敏感性，防止模型预测出无约束的方框尺寸。

The revised formulas for calculating the predicted bounding box are as follows:

$bx$ $由$ $bw$ $bh$

比较缩放前后的中心点偏移。中心点偏移范围从（0，1）调整到（-0.5，1.5）。因此，偏移量很容易变为 0 或 1。

YOLOv5 网格缩放

比较调整前后的高度和宽度缩放比（相对于锚点）。最初的yolo/darknet 方框方程有一个严重缺陷。宽度和高度完全没有限制，因为它们只是 out=exp(in)，这是很危险的，因为它会导致梯度失控、不稳定、NaN 损失，最终导致训练完全失败。

YOLOv5 无限制扩展

4.4 建设目标

The build target process in YOLOv5 is critical for training efficiency and model accuracy. It involves assigning ground truth boxes to the appropriate grid cells in the output map and matching them with the appropriate anchor boxes.

这一过程遵循以下步骤：