์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

์ฐธ์กฐ ultralytics/models/rtdetr/train.py

์ฐธ๊ณ 

์ด ํŒŒ์ผ์€ https://github.com/ultralytics/ ultralytics/blob/main/ ultralytics/models/rtdetr/train .py์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์ œ๋ฅผ ๋ฐœ๊ฒฌํ•˜๋ฉด ํ’€ ๋ฆฌํ€˜์ŠคํŠธ (๐Ÿ› ๏ธ)๋ฅผ ์ œ์ถœํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋„๋ก ๋„์™€์ฃผ์„ธ์š”. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ๐Ÿ™!



ultralytics.models.rtdetr.train.RTDETRTrainer

๊ธฐ์ง€: DetectionTrainer

์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ๊ฐ์ง€๋ฅผ ์œ„ํ•ด ๋ฐ”์ด๋‘์—์„œ ๊ฐœ๋ฐœํ•œ RT-DETR ๋ชจ๋ธ์šฉ ํŠธ๋ ˆ์ด๋„ˆ ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค. ์˜ DetectionTrainer YOLO ํด๋ž˜์Šค๋ฅผ ํ™•์žฅํ•˜์—ฌ RT-DETR ์˜ ํŠน์ • ๊ธฐ๋Šฅ ๋ฐ ์•„ํ‚คํ…์ฒ˜์— ๋งž๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ Vision ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ํ™œ์šฉํ•˜๋ฉฐ IoU ์ธ์‹ ์ฟผ๋ฆฌ ์„ ํƒ ๋ฐ ์ ์‘ํ˜• ์ถ”๋ก  ์†๋„์™€ ๊ฐ™์€ ๊ธฐ๋Šฅ์„ ๊ฐ–์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ฐธ๊ณ  ์‚ฌํ•ญ
  • RT-DETR ์—์„œ ์‚ฌ์šฉ๋˜๋Š” F.grid_sample์€ deterministic=True ์ธ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • AMP ํ›ˆ๋ จ์€ NaN ์ถœ๋ ฅ์œผ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ถ„๋ฒ• ๊ทธ๋ž˜ํ”„ ๋งค์นญ ์‹œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์˜ˆ์ œ
from ultralytics.models.rtdetr.train import RTDETRTrainer

args = dict(model='rtdetr-l.yaml', data='coco8.yaml', imgsz=640, epochs=3)
trainer = RTDETRTrainer(overrides=args)
trainer.train()
์˜ ์†Œ์Šค ์ฝ”๋“œ ultralytics/models/rtdetr/train.py
class RTDETRTrainer(DetectionTrainer):
    """
    Trainer class for the RT-DETR model developed by Baidu for real-time object detection. Extends the DetectionTrainer
    class for YOLO to adapt to the specific features and architecture of RT-DETR. This model leverages Vision
    Transformers and has capabilities like IoU-aware query selection and adaptable inference speed.

    Notes:
        - F.grid_sample used in RT-DETR does not support the `deterministic=True` argument.
        - AMP training can lead to NaN outputs and may produce errors during bipartite graph matching.

    Example:
        ```python
        from ultralytics.models.rtdetr.train import RTDETRTrainer

        args = dict(model='rtdetr-l.yaml', data='coco8.yaml', imgsz=640, epochs=3)
        trainer = RTDETRTrainer(overrides=args)
        trainer.train()
        ```
    """

    def get_model(self, cfg=None, weights=None, verbose=True):
        """
        Initialize and return an RT-DETR model for object detection tasks.

        Args:
            cfg (dict, optional): Model configuration. Defaults to None.
            weights (str, optional): Path to pre-trained model weights. Defaults to None.
            verbose (bool): Verbose logging if True. Defaults to True.

        Returns:
            (RTDETRDetectionModel): Initialized model.
        """
        model = RTDETRDetectionModel(cfg, nc=self.data["nc"], verbose=verbose and RANK == -1)
        if weights:
            model.load(weights)
        return model

    def build_dataset(self, img_path, mode="val", batch=None):
        """
        Build and return an RT-DETR dataset for training or validation.

        Args:
            img_path (str): Path to the folder containing images.
            mode (str): Dataset mode, either 'train' or 'val'.
            batch (int, optional): Batch size for rectangle training. Defaults to None.

        Returns:
            (RTDETRDataset): Dataset object for the specific mode.
        """
        return RTDETRDataset(
            img_path=img_path,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=False,
            cache=self.args.cache or None,
            prefix=colorstr(f"{mode}: "),
            data=self.data,
        )

    def get_validator(self):
        """
        Returns a DetectionValidator suitable for RT-DETR model validation.

        Returns:
            (RTDETRValidator): Validator object for model validation.
        """
        self.loss_names = "giou_loss", "cls_loss", "l1_loss"
        return RTDETRValidator(self.test_loader, save_dir=self.save_dir, args=copy(self.args))

    def preprocess_batch(self, batch):
        """
        Preprocess a batch of images. Scales and converts the images to float format.

        Args:
            batch (dict): Dictionary containing a batch of images, bboxes, and labels.

        Returns:
            (dict): Preprocessed batch.
        """
        batch = super().preprocess_batch(batch)
        bs = len(batch["img"])
        batch_idx = batch["batch_idx"]
        gt_bbox, gt_class = [], []
        for i in range(bs):
            gt_bbox.append(batch["bboxes"][batch_idx == i].to(batch_idx.device))
            gt_class.append(batch["cls"][batch_idx == i].to(device=batch_idx.device, dtype=torch.long))
        return batch

build_dataset(img_path, mode='val', batch=None)

ํŠธ๋ ˆ์ด๋‹ ๋˜๋Š” ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ๋ฅผ ์œ„ํ•ด RT-DETR ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ๋นŒ๋“œํ•˜๊ณ  ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋งค๊ฐœ๋ณ€์ˆ˜:

์ด๋ฆ„ ์œ ํ˜• ์„ค๋ช… ๊ธฐ๋ณธ๊ฐ’
img_path str

์ด๋ฏธ์ง€๊ฐ€ ํฌํ•จ๋œ ํด๋”์˜ ๊ฒฝ๋กœ์ž…๋‹ˆ๋‹ค.

ํ•„์ˆ˜
mode str

๋ฐ์ดํ„ฐ์…‹ ๋ชจ๋“œ, 'train' ๋˜๋Š” 'val'.

'val'
batch int

์‚ฌ๊ฐํ˜• ํŠธ๋ ˆ์ด๋‹์„ ์œ„ํ•œ ๋ฐฐ์น˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ ์—†์Œ์ž…๋‹ˆ๋‹ค.

None

๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

์œ ํ˜• ์„ค๋ช…
RTDETRDataset

ํŠน์ • ๋ชจ๋“œ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐ์ฒด์ž…๋‹ˆ๋‹ค.

์˜ ์†Œ์Šค ์ฝ”๋“œ ultralytics/models/rtdetr/train.py
def build_dataset(self, img_path, mode="val", batch=None):
    """
    Build and return an RT-DETR dataset for training or validation.

    Args:
        img_path (str): Path to the folder containing images.
        mode (str): Dataset mode, either 'train' or 'val'.
        batch (int, optional): Batch size for rectangle training. Defaults to None.

    Returns:
        (RTDETRDataset): Dataset object for the specific mode.
    """
    return RTDETRDataset(
        img_path=img_path,
        imgsz=self.args.imgsz,
        batch_size=batch,
        augment=mode == "train",
        hyp=self.args,
        rect=False,
        cache=self.args.cache or None,
        prefix=colorstr(f"{mode}: "),
        data=self.data,
    )

get_model(cfg=None, weights=None, verbose=True)

๊ฐœ์ฒด ๊ฐ์ง€ ์ž‘์—…์„ ์œ„ํ•ด RT-DETR ๋ชจ๋ธ์„ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋งค๊ฐœ๋ณ€์ˆ˜:

์ด๋ฆ„ ์œ ํ˜• ์„ค๋ช… ๊ธฐ๋ณธ๊ฐ’
cfg dict

๋ชจ๋ธ ๊ตฌ์„ฑ. ๊ธฐ๋ณธ๊ฐ’์€ ์—†์Œ์ž…๋‹ˆ๋‹ค.

None
weights str

์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜์˜ ๊ฒฝ๋กœ์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ ์—†์Œ์ž…๋‹ˆ๋‹ค.

None
verbose bool

True์ธ ๊ฒฝ์šฐ ์ž์„ธํ•œ ๋กœ๊น…. ๊ธฐ๋ณธ๊ฐ’์€ True์ž…๋‹ˆ๋‹ค.

True

๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

์œ ํ˜• ์„ค๋ช…
RTDETRDetectionModel

์ดˆ๊ธฐํ™”๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์˜ ์†Œ์Šค ์ฝ”๋“œ ultralytics/models/rtdetr/train.py
def get_model(self, cfg=None, weights=None, verbose=True):
    """
    Initialize and return an RT-DETR model for object detection tasks.

    Args:
        cfg (dict, optional): Model configuration. Defaults to None.
        weights (str, optional): Path to pre-trained model weights. Defaults to None.
        verbose (bool): Verbose logging if True. Defaults to True.

    Returns:
        (RTDETRDetectionModel): Initialized model.
    """
    model = RTDETRDetectionModel(cfg, nc=self.data["nc"], verbose=verbose and RANK == -1)
    if weights:
        model.load(weights)
    return model

get_validator()

RT-DETR ๋ชจ๋ธ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ์— ์ ํ•ฉํ•œ DetectionValidator๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

์œ ํ˜• ์„ค๋ช…
RTDETRValidator

๋ชจ๋ธ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ๋ฅผ ์œ„ํ•œ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ๊ธฐ ๊ฐ์ฒด์ž…๋‹ˆ๋‹ค.

์˜ ์†Œ์Šค ์ฝ”๋“œ ultralytics/models/rtdetr/train.py
def get_validator(self):
    """
    Returns a DetectionValidator suitable for RT-DETR model validation.

    Returns:
        (RTDETRValidator): Validator object for model validation.
    """
    self.loss_names = "giou_loss", "cls_loss", "l1_loss"
    return RTDETRValidator(self.test_loader, save_dir=self.save_dir, args=copy(self.args))

preprocess_batch(batch)

์ด๋ฏธ์ง€ ๋ฐฐ์น˜๋ฅผ ์‚ฌ์ „ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜๊ณ  ํ”Œ๋กœํŠธ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋งค๊ฐœ๋ณ€์ˆ˜:

์ด๋ฆ„ ์œ ํ˜• ์„ค๋ช… ๊ธฐ๋ณธ๊ฐ’
batch dict

์ด๋ฏธ์ง€, ๋ฐ•์Šค, ๋ ˆ์ด๋ธ”์ด ์ผ๊ด„์ ์œผ๋กœ ํฌํ•จ๋œ ์‚ฌ์ „์ž…๋‹ˆ๋‹ค.

ํ•„์ˆ˜

๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

์œ ํ˜• ์„ค๋ช…
dict

์‚ฌ์ „ ์ฒ˜๋ฆฌ๋œ ๋ฐฐ์น˜.

์˜ ์†Œ์Šค ์ฝ”๋“œ ultralytics/models/rtdetr/train.py
def preprocess_batch(self, batch):
    """
    Preprocess a batch of images. Scales and converts the images to float format.

    Args:
        batch (dict): Dictionary containing a batch of images, bboxes, and labels.

    Returns:
        (dict): Preprocessed batch.
    """
    batch = super().preprocess_batch(batch)
    bs = len(batch["img"])
    batch_idx = batch["batch_idx"]
    gt_bbox, gt_class = [], []
    for i in range(bs):
        gt_bbox.append(batch["bboxes"][batch_idx == i].to(batch_idx.device))
        gt_class.append(batch["cls"][batch_idx == i].to(device=batch_idx.device, dtype=torch.long))
    return batch





์ƒ์„ฑ๋จ 2023-11-12, ์—…๋ฐ์ดํŠธ๋จ 2023-11-25
์ž‘์„ฑ์ž: glenn-jocher (3)