Skip to content

Reference for ultralytics/models/rtdetr/predict.py

Note

This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/rtdetr/predict.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!


ultralytics.models.rtdetr.predict.RTDETRPredictor

RTDETRPredictor(cfg=DEFAULT_CFG, overrides=None, _callbacks=None)

Bases: BasePredictor

RT-DETR (Real-Time Detection Transformer) Predictor extending the BasePredictor class for making predictions.

This class leverages Vision Transformers to provide real-time object detection while maintaining high accuracy. It supports key features like efficient hybrid encoding and IoU-aware query selection.

Attributes:

Name Type Description
imgsz int

Image size for inference (must be square and scale-filled).

args dict

Argument overrides for the predictor.

model Module

The loaded RT-DETR model.

batch list

Current batch of processed inputs.

Examples:

>>> from ultralytics.utils import ASSETS
>>> from ultralytics.models.rtdetr import RTDETRPredictor
>>> args = dict(model="rtdetr-l.pt", source=ASSETS)
>>> predictor = RTDETRPredictor(overrides=args)
>>> predictor.predict_cli()
Source code in ultralytics/engine/predictor.py
def __init__(self, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
    """
    Initialize the BasePredictor class.

    Args:
        cfg (str | dict): Path to a configuration file or a configuration dictionary.
        overrides (dict | None): Configuration overrides.
        _callbacks (dict | None): Dictionary of callback functions.
    """
    self.args = get_cfg(cfg, overrides)
    self.save_dir = get_save_dir(self.args)
    if self.args.conf is None:
        self.args.conf = 0.25  # default conf=0.25
    self.done_warmup = False
    if self.args.show:
        self.args.show = check_imshow(warn=True)

    # Usable if setup is done
    self.model = None
    self.data = self.args.data  # data_dict
    self.imgsz = None
    self.device = None
    self.dataset = None
    self.vid_writer = {}  # dict of {save_path: video_writer, ...}
    self.plotted_img = None
    self.source_type = None
    self.seen = 0
    self.windows = []
    self.batch = None
    self.results = None
    self.transforms = None
    self.callbacks = _callbacks or callbacks.get_default_callbacks()
    self.txt_path = None
    self._lock = threading.Lock()  # for automatic thread-safe inference
    callbacks.add_integration_callbacks(self)

postprocess

postprocess(preds, img, orig_imgs)

Postprocess the raw predictions from the model to generate bounding boxes and confidence scores.

The method filters detections based on confidence and class if specified in self.args. It converts model predictions to Results objects containing properly scaled bounding boxes.

Parameters:

Name Type Description Default
preds List | Tuple

List of [predictions, extra] from the model, where predictions contain bounding boxes and scores.

required
img Tensor

Processed input images with shape (N, 3, H, W).

required
orig_imgs List | Tensor

Original, unprocessed images.

required

Returns:

Type Description
List[Results]

A list of Results objects containing the post-processed bounding boxes, confidence scores, and class labels.

Source code in ultralytics/models/rtdetr/predict.py
def postprocess(self, preds, img, orig_imgs):
    """
    Postprocess the raw predictions from the model to generate bounding boxes and confidence scores.

    The method filters detections based on confidence and class if specified in `self.args`. It converts
    model predictions to Results objects containing properly scaled bounding boxes.

    Args:
        preds (List | Tuple): List of [predictions, extra] from the model, where predictions contain
            bounding boxes and scores.
        img (torch.Tensor): Processed input images with shape (N, 3, H, W).
        orig_imgs (List | torch.Tensor): Original, unprocessed images.

    Returns:
        (List[Results]): A list of Results objects containing the post-processed bounding boxes, confidence scores,
            and class labels.
    """
    if not isinstance(preds, (list, tuple)):  # list for PyTorch inference but list[0] Tensor for export inference
        preds = [preds, None]

    nd = preds[0].shape[-1]
    bboxes, scores = preds[0].split((4, nd - 4), dim=-1)

    if not isinstance(orig_imgs, list):  # input images are a torch.Tensor, not a list
        orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)

    results = []
    for bbox, score, orig_img, img_path in zip(bboxes, scores, orig_imgs, self.batch[0]):  # (300, 4)
        bbox = ops.xywh2xyxy(bbox)
        max_score, cls = score.max(-1, keepdim=True)  # (300, 1)
        idx = max_score.squeeze(-1) > self.args.conf  # (300, )
        if self.args.classes is not None:
            idx = (cls == torch.tensor(self.args.classes, device=cls.device)).any(1) & idx
        pred = torch.cat([bbox, max_score, cls], dim=-1)[idx]  # filter
        oh, ow = orig_img.shape[:2]
        pred[..., [0, 2]] *= ow  # scale x coordinates to original width
        pred[..., [1, 3]] *= oh  # scale y coordinates to original height
        results.append(Results(orig_img, path=img_path, names=self.model.names, boxes=pred))
    return results

pre_transform

pre_transform(im)

Pre-transforms the input images before feeding them into the model for inference. The input images are letterboxed to ensure a square aspect ratio and scale-filled. The size must be square(640) and scale_filled.

Parameters:

Name Type Description Default
im list[ndarray] | Tensor

Input images of shape (N,3,h,w) for tensor, [(h,w,3) x N] for list.

required

Returns:

Type Description
list

List of pre-transformed images ready for model inference.

Source code in ultralytics/models/rtdetr/predict.py
def pre_transform(self, im):
    """
    Pre-transforms the input images before feeding them into the model for inference. The input images are
    letterboxed to ensure a square aspect ratio and scale-filled. The size must be square(640) and scale_filled.

    Args:
        im (list[np.ndarray] |torch.Tensor): Input images of shape (N,3,h,w) for tensor, [(h,w,3) x N] for list.

    Returns:
        (list): List of pre-transformed images ready for model inference.
    """
    letterbox = LetterBox(self.imgsz, auto=False, scale_fill=True)
    return [letterbox(image=x) for x in im]



📅 Created 1 year ago ✏️ Updated 6 months ago