Reference for ultralytics/models/rtdetr/predict.py
This page is sourced from https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/rtdetr/predict.py. Have an improvement or example to add? Open a Pull Request — thank you! 🙏
ultralytics.models.rtdetr.predict.RTDETRPredictor
RTDETRPredictor()Bases: BasePredictor
RT-DETR (Real-Time Detection Transformer) Predictor extending the BasePredictor class for making predictions.
This class leverages Vision Transformers to provide real-time object detection while maintaining high accuracy. It supports key features like efficient hybrid encoding and IoU-aware query selection.
Attributes
| Name | Type | Description |
|---|---|---|
imgsz | int | Image size for inference (must be square and scale-filled). |
args | dict | Argument overrides for the predictor. |
model | torch.nn.Module | The loaded RT-DETR model. |
batch | list | Current batch of processed inputs. |
Methods
| Name | Description |
|---|---|
postprocess | Postprocess the raw predictions from the model to generate bounding boxes and confidence scores. |
pre_transform | Pre-transform input images before feeding them into the model for inference. |
Examples
>>> from ultralytics.utils import ASSETS
>>> from ultralytics.models.rtdetr import RTDETRPredictor
>>> args = dict(model="rtdetr-l.pt", source=ASSETS)
>>> predictor = RTDETRPredictor(overrides=args)
>>> predictor.predict_cli()Source code in ultralytics/models/rtdetr/predict.py
class RTDETRPredictor(BasePredictor): ultralytics.models.rtdetr.predict.RTDETRPredictor.postprocess
def postprocess(self, preds, img, orig_imgs)Postprocess the raw predictions from the model to generate bounding boxes and confidence scores.
The method filters detections based on confidence and class if specified in self.args. It converts model predictions (already top-k selected by the decoder head) to Results objects containing properly scaled bounding boxes.
Args
| Name | Type | Description | Default |
|---|---|---|---|
preds | `list | tuple` | List of [predictions, extra] from the model, where predictions have shape (bs, num_queries, 6) with format [cx, cy, w, h, score, class]. |
img | torch.Tensor | Processed input images with shape (N, 3, H, W). | required |
orig_imgs | `list | torch.Tensor` | Original, unprocessed images. |
Returns
| Type | Description |
|---|---|
list[Results] | A list of Results objects containing the post-processed bounding boxes, confidence scores, |
Source code in ultralytics/models/rtdetr/predict.py
def postprocess(self, preds, img, orig_imgs):
"""Postprocess the raw predictions from the model to generate bounding boxes and confidence scores.
The method filters detections based on confidence and class if specified in `self.args`. It converts model
predictions (already top-k selected by the decoder head) to Results objects containing properly scaled bounding
boxes.
Args:
preds (list | tuple): List of [predictions, extra] from the model, where predictions have shape (bs,
num_queries, 6) with format [cx, cy, w, h, score, class].
img (torch.Tensor): Processed input images with shape (N, 3, H, W).
orig_imgs (list | torch.Tensor): Original, unprocessed images.
Returns:
(list[Results]): A list of Results objects containing the post-processed bounding boxes, confidence scores,
and class labels.
"""
if isinstance(preds, (list, tuple)):
preds = preds[0]
bboxes, scores, labels = preds.split((4, 1, 1), dim=-1)
if not isinstance(orig_imgs, list): # input images are a torch.Tensor, not a list
orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)[..., ::-1]
results = []
for bbox, score, label, orig_img, img_path in zip(bboxes, scores, labels, orig_imgs, self.batch[0]):
bbox = ops.xywh2xyxy(bbox)
idx = score.squeeze(-1) > self.args.conf
if self.args.classes is not None:
idx = (label == torch.tensor(self.args.classes, device=label.device)).any(1) & idx
pred = torch.cat([bbox, score, label], dim=-1)[idx][: self.args.max_det]
oh, ow = orig_img.shape[:2]
pred[..., [0, 2]] *= ow # scale x coordinates to original width
pred[..., [1, 3]] *= oh # scale y coordinates to original height
results.append(Results(orig_img, path=img_path, names=self.model.names, boxes=pred))
return results ultralytics.models.rtdetr.predict.RTDETRPredictor.pre_transform
def pre_transform(self, im)Pre-transform input images before feeding them into the model for inference.
The input images are letterboxed to ensure a square aspect ratio and scale-filled.
Args
| Name | Type | Description | Default |
|---|---|---|---|
im | list[np.ndarray] | Input images of shape [(H, W, 3) x N]. | required |
Returns
| Type | Description |
|---|---|
list | List of pre-transformed images ready for model inference. |
Source code in ultralytics/models/rtdetr/predict.py
def pre_transform(self, im):
"""Pre-transform input images before feeding them into the model for inference.
The input images are letterboxed to ensure a square aspect ratio and scale-filled.
Args:
im (list[np.ndarray]): Input images of shape [(H, W, 3) x N].
Returns:
(list): List of pre-transformed images ready for model inference.
"""
letterbox = LetterBox(self.imgsz, auto=False, scale_fill=True)
return [letterbox(image=x) for x in im]