Link to this section自定义 Trainer#
Ultralytics 训练流水线围绕 BaseTrainer 和诸如 DetectionTrainer 等特定任务的训练器构建。这些类开箱即用地处理训练循环、验证、检查点保存和日志记录。当你需要更多控制权时(例如跟踪自定义指标、调整损失加权或实现学习率调度),可以通过继承训练器类并重写特定方法来实现。
本指南介绍了七种常见的自定义方法:
- Logging custom metrics (F1 score) at the end of each epoch
- 添加类别权重 以处理类别不平衡
- 基于不同指标保存最佳模型
- 在前 N 个 epoch 冻结主干网络,之后解冻
- 指定分层学习率
- 针对多 GPU 训练 同步 BatchNorm
- 配置梯度裁剪 以进行稳定性调优
在阅读本指南之前,请确保你已熟悉 训练 YOLO 模型 的基础知识以及 高级自定义 页面,该页面涵盖了 BaseTrainer 架构。
Link to this section自定义训练器的工作原理#
The YOLO model class accepts a trainer parameter in the train() method. This allows you to pass your own trainer class that extends the default behavior:
from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
class CustomTrainer(DetectionTrainer):
"""A custom trainer that extends DetectionTrainer with additional functionality."""
pass # Add your customizations here
model = YOLO("yolo26n.pt")
model.train(data="coco8.yaml", epochs=10, trainer=CustomTrainer)你的自定义训练器继承了 DetectionTrainer 的所有功能,因此你只需要重写你想要自定义的特定方法即可。
Link to this section记录自定义指标#
验证 步骤会计算 precision、recall 和 mAP。如果你需要额外的指标(如各类的 F1 score),请重写 validate():
import numpy as np
from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import LOGGER
class MetricsTrainer(DetectionTrainer):
"""Custom trainer that computes and logs F1 score at the end of each epoch."""
def validate(self):
"""Run validation and compute per-class F1 scores."""
metrics, fitness = super().validate()
if metrics is None:
return metrics, fitness
if hasattr(self.validator, "metrics") and hasattr(self.validator.metrics, "box"):
box = self.validator.metrics.box
f1_per_class = box.f1
class_indices = box.ap_class_index
names = self.validator.names
valid_f1 = f1_per_class[f1_per_class > 0]
mean_f1 = np.mean(valid_f1) if len(valid_f1) > 0 else 0.0
LOGGER.info(f"Mean F1 Score: {mean_f1:.4f}")
per_class_str = [
f"{names[i]}: {f1_per_class[j]:.3f}" for j, i in enumerate(class_indices) if f1_per_class[j] > 0
]
LOGGER.info(f"Per-class F1: {per_class_str}")
return metrics, fitness
model = YOLO("yolo26n.pt")
model.train(data="coco8.yaml", epochs=5, trainer=MetricsTrainer)这将在每次验证运行后记录所有类别的平均 F1 分数以及各类的详细指标。
验证器可以通过 self.validator.metrics.box 访问许多指标:
| 属性 | 描述 |
|---|---|
f1 | 每类 F1 分数 |
image_metrics | 包含精度、召回率、F1、TP、FP 和 FN 的每图像指标字典 |
p | 每类精度 |
r | 每类召回率 |
ap50 | 每类在 IoU 0.5 时的 AP |
ap | 每类在 IoU 0.5:0.95 时的 AP |
mp, mr | 平均精度和召回率 |
map50, map | 平均 AP 指标 |
Link to this section添加类别权重#
如果你的数据集存在类别不平衡(例如,制造检测中的罕见缺陷),你可以在 loss function 中提高代表性不足类别的权重。这会使模型更严厉地惩罚对稀有类别的错误分类。
要自定义损失,请对损失类、模型和训练器进行子类化:
import torch
from torch import nn
from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.nn.tasks import DetectionModel
from ultralytics.utils import RANK
from ultralytics.utils.loss import E2ELoss, v8DetectionLoss
class WeightedDetectionLoss(v8DetectionLoss):
"""Detection loss with class weights applied to BCE classification loss."""
def __init__(self, model, class_weights=None, tal_topk=10, tal_topk2=None):
"""Initialize loss with optional per-class weights for BCE."""
super().__init__(model, tal_topk=tal_topk, tal_topk2=tal_topk2)
if class_weights is not None:
self.bce = nn.BCEWithLogitsLoss(
pos_weight=class_weights.to(self.device),
reduction="none",
)
class WeightedE2ELoss(E2ELoss):
"""E2E Loss with class weights for YOLO26."""
def __init__(self, model, class_weights=None):
"""Initialize E2E loss with weighted detection loss."""
def weighted_loss_fn(model, tal_topk=10, tal_topk2=None):
return WeightedDetectionLoss(model, class_weights=class_weights, tal_topk=tal_topk, tal_topk2=tal_topk2)
super().__init__(model, loss_fn=weighted_loss_fn)
class WeightedDetectionModel(DetectionModel):
"""Detection model that uses class-weighted loss."""
def init_criterion(self):
"""Initialize weighted loss criterion with per-class weights."""
class_weights = torch.ones(self.nc)
class_weights[0] = 2.0 # upweight class 0
class_weights[1] = 3.0 # upweight rare class 1
return WeightedE2ELoss(self, class_weights=class_weights)
class WeightedTrainer(DetectionTrainer):
"""Trainer that returns a WeightedDetectionModel."""
def get_model(self, cfg=None, weights=None, verbose=True):
"""Return a WeightedDetectionModel."""
model = WeightedDetectionModel(cfg, nc=self.data["nc"], verbose=verbose and RANK == -1)
if weights:
model.load(weights)
return model
model = YOLO("yolo26n.pt")
model.train(data="coco8.yaml", epochs=10, trainer=WeightedTrainer)你可以根据数据集的标签分布自动计算类别权重。一种常见的方法是反向频率加权:
import numpy as np
# class_counts: number of instances per class
class_counts = np.array([5000, 200, 3000])
# Inverse frequency: rarer classes get higher weight
class_weights = max(class_counts) / class_counts
# Result: [1.0, 25.0, 1.67]Link to this section通过自定义指标保存最佳模型#
训练器会根据适应度(fitness)保存 best.pt,默认值为 0.9 × mAP@0.5:0.95 + 0.1 × mAP@0.5。要使用不同的指标(如 mAP@0.5 或召回率),请重写 validate() 并返回你选择的指标作为适应度值。内置的 save_model() 将自动使用它:
from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
class CustomSaveTrainer(DetectionTrainer):
"""Trainer that saves the best model based on mAP@0.5 instead of default fitness."""
def validate(self):
"""Override fitness to use mAP@0.5 for best model selection."""
metrics, fitness = super().validate()
if metrics:
fitness = metrics.get("metrics/mAP50(B)", fitness)
if self.best_fitness is None or fitness > self.best_fitness:
self.best_fitness = fitness
return metrics, fitness
model = YOLO("yolo26n.pt")
model.train(data="coco8.yaml", epochs=20, trainer=CustomSaveTrainer)验证后 self.metrics 中可用的常见指标包括:
| 键 | 描述 |
|---|---|
metrics/precision(B) | 精度 |
metrics/recall(B) | 召回率 |
metrics/mAP50(B) | IoU 0.5 时的 mAP |
metrics/mAP50-95(B) | IoU 0.5:0.95 时的 mAP |
Link to this section冻结与解冻主干网络#
迁移学习 工作流通常受益于在前 N 个 epoch 冻结预训练主干网络,从而允许检测头在 微调 整个网络之前进行适应。Ultralytics 提供了一个 freeze 参数用于在训练开始时冻结层,你可以使用 回调 在 N 个 epoch 后将它们解冻:
from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import LOGGER
FREEZE_EPOCHS = 5
def unfreeze_backbone(trainer):
"""Callback to unfreeze all layers after FREEZE_EPOCHS."""
if trainer.epoch == FREEZE_EPOCHS:
LOGGER.info(f"Epoch {trainer.epoch}: Unfreezing all layers for fine-tuning")
for name, param in trainer.model.named_parameters():
if not param.requires_grad:
param.requires_grad = True
LOGGER.info(f" Unfroze: {name}")
trainer.freeze_layer_names = [".dfl"]
class FreezingTrainer(DetectionTrainer):
"""Trainer with backbone freezing for first N epochs."""
def __init__(self, *args, **kwargs):
"""Initialize and register the unfreeze callback."""
super().__init__(*args, **kwargs)
self.add_callback("on_train_epoch_start", unfreeze_backbone)
model = YOLO("yolo26n.pt")
model.train(data="coco8.yaml", epochs=20, freeze=10, trainer=FreezingTrainer)freeze=10 参数在训练开始时冻结前 10 层(主干网络)。on_train_epoch_start 回调在每个 epoch 开始时触发,并在冻结期结束后解冻所有参数。
freeze=10冻结前 10 层(通常是 YOLO 架构中的主干网络)freeze=[0, 1, 2, 3]按索引冻结特定层- 更高的
FREEZE_EPOCHS值让检测头在主干网络改变之前有更多时间进行适应
Link to this section分层学习率#
网络的不同部分可以从不同的 learning rates 中获益。一种常见的策略是为预训练主干网络使用较低的学习率以保留学到的特征,同时允许检测头以较高的学习率更快地适应:
import torch
from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import LOGGER
from ultralytics.utils.torch_utils import unwrap_model
class PerLayerLRTrainer(DetectionTrainer):
"""Trainer with different learning rates for backbone and head."""
def build_optimizer(self, model, name="auto", lr=0.001, momentum=0.9, decay=1e-5, iterations=1e5):
"""Build optimizer with separate learning rates for backbone and head."""
backbone_params = []
head_params = []
for k, v in unwrap_model(model).named_parameters():
if not v.requires_grad:
continue
is_backbone = any(k.startswith(f"model.{i}.") for i in range(10))
if is_backbone:
backbone_params.append(v)
else:
head_params.append(v)
backbone_lr = lr * 0.1
optimizer = torch.optim.AdamW(
[
{"params": backbone_params, "lr": backbone_lr, "weight_decay": decay},
{"params": head_params, "lr": lr, "weight_decay": decay},
],
)
LOGGER.info(
f"PerLayerLR optimizer: backbone ({len(backbone_params)} params, lr={backbone_lr}) "
f"| head ({len(head_params)} params, lr={lr})"
)
return optimizer
model = YOLO("yolo26n.pt")
model.train(data="coco8.yaml", epochs=20, trainer=PerLayerLRTrainer)Link to this sectionRT-DETR 变体#
对于 RT-DETR,模式相同但有两处改进。主干网络长度从 model.yaml["backbone"] 中读取,因此同一个训练器可以跨 RT-DETR 变体(RT-DETR-L, RT-DETR-X, ResNet-50/101 主干)工作,而无需硬编码层数。参数也在每个部分中分为权重、BatchNorm 和偏置组,以便从 BatchNorm 参数和偏置中排除权重衰减,这与默认训练器的策略一致。这对于 RT-DETR 微调特别有用,因为解码器头通常是随机初始化的,而主干网络携带的预训练特征通过较低的学习率获益:
import torch
from torch import nn
from ultralytics import RTDETR
from ultralytics.models.rtdetr.train import RTDETRTrainer
from ultralytics.utils import LOGGER, colorstr
from ultralytics.utils.torch_utils import unwrap_model
class RTDETRBackboneLRTrainer(RTDETRTrainer):
"""RT-DETR trainer with a lower learning rate for backbone parameters."""
backbone_lr_ratio = 0.1 # backbone learning rate as a fraction of head learning rate
def build_optimizer(self, model, name="auto", lr=0.001, momentum=0.9, decay=1e-5, iterations=1e5):
"""Build an AdamW optimizer with six param groups: head and backbone x {weight, bn, bias}."""
# Resolve optimizer name; "auto" maps to AdamW with RT-DETR-style defaults
canonical = {"Adam", "Adamax", "AdamW", "NAdam", "RAdam", "auto"}
name = {x.lower(): x for x in canonical}.get(name.lower(), name)
if name == "auto":
name, lr, momentum = "AdamW", 1e-4, 0.9
self.args.warmup_bias_lr = 0.0 # RT-DETR warms biases from 0, unlike YOLO's 0.1
if name not in {"Adam", "Adamax", "AdamW", "NAdam", "RAdam"}:
raise NotImplementedError(f"This trainer only supports AdamW-family optimizers; got {name}")
# Identify backbone parameters from model.yaml and route each param into a (section, kind) group
unwrapped = unwrap_model(model)
backbone_len = len(unwrapped.yaml["backbone"])
norm_types = tuple(v for k, v in nn.__dict__.items() if "Norm" in k)
groups = {f"{s}_{k}": [] for s in ("head", "backbone") for k in ("weight", "bn", "bias")}
for module_name, module in unwrapped.named_modules():
for param_name, param in module.named_parameters(recurse=False):
if not param.requires_grad:
continue
fullname = f"{module_name}.{param_name}" if module_name else param_name
parts = fullname.split(".")
section = (
"backbone"
if len(parts) > 1 and parts[0] == "model" and parts[1].isdigit() and int(parts[1]) < backbone_len
else "head"
)
if "bias" in param_name:
kind = "bias"
elif isinstance(module, norm_types) or "logit_scale" in fullname:
kind = "bn"
else:
kind = "weight"
groups[f"{section}_{kind}"].append(param)
# Build the optimizer with per-group lr and weight decay; backbone groups use lr * backbone_lr_ratio
backbone_lr = lr * self.backbone_lr_ratio
param_groups = [
{"params": groups["head_weight"], "lr": lr, "weight_decay": decay, "param_group": "weight"},
{"params": groups["head_bn"], "lr": lr, "weight_decay": 0.0, "param_group": "bn"},
{"params": groups["head_bias"], "lr": lr, "weight_decay": 0.0, "param_group": "bias"},
{"params": groups["backbone_weight"], "lr": backbone_lr, "weight_decay": decay, "param_group": "weight"},
{"params": groups["backbone_bn"], "lr": backbone_lr, "weight_decay": 0.0, "param_group": "bn"},
{"params": groups["backbone_bias"], "lr": backbone_lr, "weight_decay": 0.0, "param_group": "bias"},
]
param_groups = [pg for pg in param_groups if pg["params"]] # drop empty groups
optimizer = getattr(torch.optim, name)(param_groups, betas=(momentum, 0.999))
LOGGER.info(
f"{colorstr('optimizer:')} {name}(lr={lr}, backbone_lr={backbone_lr}) with parameter groups\n"
f" Head: {len(groups['head_bn'])} bn, {len(groups['head_weight'])} weight(decay={decay}), "
f"{len(groups['head_bias'])} bias (lr={lr})\n"
f" Backbone: {len(groups['backbone_bn'])} bn, {len(groups['backbone_weight'])} weight(decay={decay}), "
f"{len(groups['backbone_bias'])} bias (lr={backbone_lr})"
)
return optimizer
model = RTDETR("rtdetr-l.pt")
model.train(data="coco8.yaml", epochs=20, trainer=RTDETRBackboneLRTrainer)一个常见的起点是 backbone_lr_ratio = 0.1,这与使用 HGNetV2 主干的原始 RT-DETR 设置相匹配。文献建议将比例与主干大小和预训练数据规模成反比:在超大规模数据集上预训练的大型主干(例如在数亿张图像上使用 DINO、CLIP 或 MAE 训练的 ViT-L/H)通常使用 0.01 或更小的比率来保留特征,而预训练较轻的小型主干则可以容忍 0.5 或更高的比率。
内置的学习率调度器(cosine 或 linear)仍然作用于各组的基础学习率之上。主干和检测头的学习率将遵循相同的衰减调度,并在整个训练过程中保持它们之间的比例。
这些自定义设置可以通过重写多个方法并根据需要添加回调,组合到一个单一的训练器类中。
Link to this section多 GPU 训练的同步 BatchNorm#
当使用 DistributedDataParallel 在多个 GPU 上训练时,默认的 BatchNorm2d 层会在每个 GPU 上独立计算统计信息。对于 RT-DETR 微调和其他使用较小单 GPU 批大小的配方,单 GPU 批统计信息可能带有噪声。PyTorch 的 SyncBatchNorm 在所有级别上同步均值和方差以获取全局批统计信息,这通常会以微小的跨 GPU 通信开销为代价提高收敛性。
转换必须在模型进入 GPU 后但在 DDP 包装它之前进行。最干净的钩子是 set_model_attributes(),BaseTrainer 正是在该窗口中调用它:
from torch import nn
from ultralytics import RTDETR
from ultralytics.models.rtdetr.train import RTDETRTrainer
class SyncBNTrainer(RTDETRTrainer):
"""RT-DETR trainer that converts BatchNorm to SyncBatchNorm for multi-GPU training."""
def set_model_attributes(self):
"""Run the parent setup, then convert BN to SyncBatchNorm when training on multiple GPUs."""
super().set_model_attributes()
if self.world_size > 1:
self.model = nn.SyncBatchNorm.convert_sync_batchnorm(self.model)
model = RTDETR("rtdetr-l.pt")
model.train(data="coco8.yaml", epochs=20, device=[0, 1], trainer=SyncBNTrainer)world_size > 1 保护确保训练器在单 GPU 运行时也是安全的;在单个 GPU 上,转换被跳过,训练继续使用常规的 BatchNorm2d。相同的模式适用于 YOLO,只需将父类切换为 DetectionTrainer。
| 场景 | 建议 |
|---|---|
| 多 GPU 训练,小的单 GPU 批大小 (≤ 16) | 启用 |
| 多 GPU 训练,大的单 GPU 批大小 (≥ 32) | 可选;微小益处 |
| 单 GPU 训练 | 不适用(跳过) |
Link to this section可配置梯度裁剪#
The default trainer clips gradients to max_norm=10.0 in optimizer_step(), a loose value tuned for YOLO models where gradients rarely exceed it. DETR-family detectors (RT-DETR, DEIM, DINO) typically use much tighter values such as 0.1 to stabilize the decoder's cross-attention layers, where gradient magnitudes can spike. To override the clip value, subclass the trainer and override optimizer_step():
import torch
from ultralytics import RTDETR
from ultralytics.models.rtdetr.train import RTDETRTrainer
class CustomClipTrainer(RTDETRTrainer):
"""RT-DETR trainer with configurable gradient clipping."""
clip_grad_norm = 0.1 # max gradient norm; set to 0 to disable clipping
def optimizer_step(self):
"""Run an optimizer step with a configurable gradient-norm clip."""
self.scaler.unscale_(self.optimizer)
if self.clip_grad_norm > 0:
torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=self.clip_grad_norm)
self.scaler.step(self.optimizer)
self.scaler.update()
self.optimizer.zero_grad()
if self.ema:
self.ema.update(self.model)
model = RTDETR("rtdetr-l.pt")
model.train(data="coco8.yaml", epochs=20, trainer=CustomClipTrainer)通过切换父类为 DetectionTrainer (from ultralytics.models.yolo.detect import DetectionTrainer) 并加载带有 YOLO("yolo26n.pt") 的 YOLO 检查点,相同的训练器也适用于 YOLO。optimizer_step 主体保持不变。
| 架构系列 | 典型 max_norm |
|---|---|
| RT-DETR / DEIM / DETR 系列 | 0.1 |
| YOLO (Ultralytics 默认) | 10.0 |
| 禁用裁剪 | 0 |
Link to this section常见问题 (FAQ)#
Link to this section如何将自定义训练器传给 YOLO?#
Pass your custom trainer class (not an instance) to the trainer parameter in model.train():
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
model.train(data="coco8.yaml", trainer=MyCustomTrainer)YOLO 类在内部处理训练器的实例化。有关训练器架构的更多详细信息,请参阅 高级自定义 页面。
Link to this section我可以重写哪些 BaseTrainer 方法?#
可用于自定义的关键方法:
| 方法 | 目的 |
|---|---|
validate() | 运行验证并返回指标 |
build_optimizer() | 构建优化器 |
save_model() | 保存训练检查点 |
get_model() | 返回模型实例 |
get_validator() | 返回验证器实例 |
get_dataloader() | 构建数据加载器 |
preprocess_batch() | 预处理输入批次 |
label_loss_items() | 格式化用于记录的损失项 |
如需完整的 API 参考,请参阅 BaseTrainer 文档。
Link to this section我可以使用回调(callbacks)而不是继承训练器(trainer)吗?#
可以,对于简单的自定义,使用 callbacks 通常就足够了。可用的回调事件包括 on_train_start、on_train_epoch_start、on_train_epoch_end、on_fit_epoch_end 和 on_model_save。这些允许你在不继承类的情况下挂载到训练循环中。上述的主干网络(backbone)冻结示例展示了这种方法。
Link to this section如何不继承模型而自定义损失函数?#
如果你的修改比较简单(例如调整损失增益),你可以直接修改 超参数:
model.train(data="coco8.yaml", box=10.0, cls=1.5, dfl=2.0)对于损失函数的结构性更改(例如添加类别权重),你需要按照 类别权重部分 中的说明继承并重写损失函数和模型。