Zum Inhalt springen

Referenz fĂŒr ultralytics/models/rtdetr/predict.py

Hinweis

Diese Datei ist verfĂŒgbar unter https://github.com/ultralytics/ ultralytics/blob/main/ ultralytics/models/rtdetr/predict .py. Wenn du ein Problem entdeckst, hilf bitte, es zu beheben, indem du einen Pull Request đŸ› ïž einreichst. Vielen Dank 🙏!



ultralytics.models.rtdetr.predict.RTDETRPredictor

Basen: BasePredictor

RT-DETR (Real-Time Detection Transformer) Predictor, der die Klasse BasePredictor erweitert, um Vorhersagen anhand des Baidu's RT-DETR Modell.

Diese Klasse nutzt die Leistung von Vision Transformers, um die Objekterkennung in Echtzeit zu ermöglichen und gleichzeitig eine hohe Genauigkeit. Sie unterstĂŒtzt wichtige Funktionen wie effiziente hybride Kodierung und IoU-bewusste Abfrageauswahl.

Beispiel
from ultralytics.utils import ASSETS
from ultralytics.models.rtdetr import RTDETRPredictor

args = dict(model='rtdetr-l.pt', source=ASSETS)
predictor = RTDETRPredictor(overrides=args)
predictor.predict_cli()

Attribute:

Name Typ Beschreibung
imgsz int

BildgrĂ¶ĂŸe fĂŒr die Inferenz (muss quadratisch und maßstabsgetreu sein).

args dict

Argumente, die den PrĂ€diktor außer Kraft setzen.

Quellcode in ultralytics/models/rtdetr/predict.py
class RTDETRPredictor(BasePredictor):
    """
    RT-DETR (Real-Time Detection Transformer) Predictor extending the BasePredictor class for making predictions using
    Baidu's RT-DETR model.

    This class leverages the power of Vision Transformers to provide real-time object detection while maintaining
    high accuracy. It supports key features like efficient hybrid encoding and IoU-aware query selection.

    Example:
        ```python
        from ultralytics.utils import ASSETS
        from ultralytics.models.rtdetr import RTDETRPredictor

        args = dict(model='rtdetr-l.pt', source=ASSETS)
        predictor = RTDETRPredictor(overrides=args)
        predictor.predict_cli()
        ```

    Attributes:
        imgsz (int): Image size for inference (must be square and scale-filled).
        args (dict): Argument overrides for the predictor.
    """

    def postprocess(self, preds, img, orig_imgs):
        """
        Postprocess the raw predictions from the model to generate bounding boxes and confidence scores.

        The method filters detections based on confidence and class if specified in `self.args`.

        Args:
            preds (torch.Tensor): Raw predictions from the model.
            img (torch.Tensor): Processed input images.
            orig_imgs (list or torch.Tensor): Original, unprocessed images.

        Returns:
            (list[Results]): A list of Results objects containing the post-processed bounding boxes, confidence scores,
                and class labels.
        """
        nd = preds[0].shape[-1]
        bboxes, scores = preds[0].split((4, nd - 4), dim=-1)

        if not isinstance(orig_imgs, list):  # input images are a torch.Tensor, not a list
            orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)

        results = []
        for i, bbox in enumerate(bboxes):  # (300, 4)
            bbox = ops.xywh2xyxy(bbox)
            score, cls = scores[i].max(-1, keepdim=True)  # (300, 1)
            idx = score.squeeze(-1) > self.args.conf  # (300, )
            if self.args.classes is not None:
                idx = (cls == torch.tensor(self.args.classes, device=cls.device)).any(1) & idx
            pred = torch.cat([bbox, score, cls], dim=-1)[idx]  # filter
            orig_img = orig_imgs[i]
            oh, ow = orig_img.shape[:2]
            pred[..., [0, 2]] *= ow
            pred[..., [1, 3]] *= oh
            img_path = self.batch[0][i]
            results.append(Results(orig_img, path=img_path, names=self.model.names, boxes=pred))
        return results

    def pre_transform(self, im):
        """
        Pre-transforms the input images before feeding them into the model for inference. The input images are
        letterboxed to ensure a square aspect ratio and scale-filled. The size must be square(640) and scaleFilled.

        Args:
            im (list[np.ndarray] |torch.Tensor): Input images of shape (N,3,h,w) for tensor, [(h,w,3) x N] for list.

        Returns:
            (list): List of pre-transformed images ready for model inference.
        """
        letterbox = LetterBox(self.imgsz, auto=False, scaleFill=True)
        return [letterbox(image=x) for x in im]

postprocess(preds, img, orig_imgs)

Verarbeite die Rohvorhersagen des Modells nach, um Bounding Boxes und Konfidenzwerte zu erstellen.

Die Methode filtert Erkennungen auf der Grundlage von Konfidenz und Klasse, wenn sie in self.args.

Parameter:

Name Typ Beschreibung Standard
preds Tensor

Rohe Vorhersagen des Modells.

erforderlich
img Tensor

Verarbeitete Eingabebilder.

erforderlich
orig_imgs list or Tensor

Originale, unbearbeitete Bilder.

erforderlich

Retouren:

Typ Beschreibung
list[Results]

Eine Liste von Ergebnisobjekten, die die nachbearbeiteten Boundingboxen, Konfidenzwerte und Klassenbeschriftungen.

Quellcode in ultralytics/models/rtdetr/predict.py
def postprocess(self, preds, img, orig_imgs):
    """
    Postprocess the raw predictions from the model to generate bounding boxes and confidence scores.

    The method filters detections based on confidence and class if specified in `self.args`.

    Args:
        preds (torch.Tensor): Raw predictions from the model.
        img (torch.Tensor): Processed input images.
        orig_imgs (list or torch.Tensor): Original, unprocessed images.

    Returns:
        (list[Results]): A list of Results objects containing the post-processed bounding boxes, confidence scores,
            and class labels.
    """
    nd = preds[0].shape[-1]
    bboxes, scores = preds[0].split((4, nd - 4), dim=-1)

    if not isinstance(orig_imgs, list):  # input images are a torch.Tensor, not a list
        orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)

    results = []
    for i, bbox in enumerate(bboxes):  # (300, 4)
        bbox = ops.xywh2xyxy(bbox)
        score, cls = scores[i].max(-1, keepdim=True)  # (300, 1)
        idx = score.squeeze(-1) > self.args.conf  # (300, )
        if self.args.classes is not None:
            idx = (cls == torch.tensor(self.args.classes, device=cls.device)).any(1) & idx
        pred = torch.cat([bbox, score, cls], dim=-1)[idx]  # filter
        orig_img = orig_imgs[i]
        oh, ow = orig_img.shape[:2]
        pred[..., [0, 2]] *= ow
        pred[..., [1, 3]] *= oh
        img_path = self.batch[0][i]
        results.append(Results(orig_img, path=img_path, names=self.model.names, boxes=pred))
    return results

pre_transform(im)

Transformiert die Eingangsbilder vor, bevor sie in das Modell fĂŒr die Schlussfolgerung eingespeist werden. Die Eingangsbilder werden Letterboxed, um ein quadratisches SeitenverhĂ€ltnis zu gewĂ€hrleisten, und scale-filled. Die GrĂ¶ĂŸe muss square(640) und scaleFilled sein.

Parameter:

Name Typ Beschreibung Standard
im list[ndarray] | Tensor

Eingabe von Bildern der Form (N,3,h,w) fĂŒr tensor, [(h,w,3) x N] fĂŒr Liste.

erforderlich

Retouren:

Typ Beschreibung
list

Liste der vorverwandelten Bilder, die fĂŒr die Modellinferenz bereit sind.

Quellcode in ultralytics/models/rtdetr/predict.py
def pre_transform(self, im):
    """
    Pre-transforms the input images before feeding them into the model for inference. The input images are
    letterboxed to ensure a square aspect ratio and scale-filled. The size must be square(640) and scaleFilled.

    Args:
        im (list[np.ndarray] |torch.Tensor): Input images of shape (N,3,h,w) for tensor, [(h,w,3) x N] for list.

    Returns:
        (list): List of pre-transformed images ready for model inference.
    """
    letterbox = LetterBox(self.imgsz, auto=False, scaleFill=True)
    return [letterbox(image=x) for x in im]





Erstellt am 2023-11-12, Aktualisiert am 2023-11-25
Autoren: glenn-jocher (3)