─░├žeri─če ge├ž

Referans i├žin ultralytics/models/utils/ops.py

Not

Bu dosya https://github.com/ultralytics/ultralytics/blob/main/ ultralytics/models/utils/ops .py adresinde mevcuttur. Bir sorun tespit ederseniz l├╝tfen bir ├çekme ─░ste─či ­čŤá´ŞĆ ile katk─▒da bulunarak d├╝zeltilmesine yard─▒mc─▒ olun. Te┼čekk├╝rler ­čÖĆ!



ultralytics.models.utils.ops.HungarianMatcher

├ťsler: Module

'de atama problemini ├ž├Âzmek i├žin t├╝revlenebilir bir mod├╝l olan HungarianMatcher'─▒ uygulayan bir mod├╝ld├╝r. u├žtan uca moda.

HungarianMatcher, bir maliyet kullanarak tahmin edilen ve temel ger├žek s─▒n─▒rlay─▒c─▒ kutular─▒ ├╝zerinde en uygun atamay─▒ ger├žekle┼čtirir s─▒n─▒fland─▒rma puanlar─▒n─▒, s─▒n─▒rlay─▒c─▒ kutu koordinatlar─▒n─▒ ve iste─če ba─čl─▒ olarak maske tahminlerini dikkate alan bir i┼člevdir.

Nitelikler:

─░sim Tip A├ž─▒klama
cost_gain dict

Maliyet katsay─▒lar─▒ s├Âzl├╝─č├╝: 'class', 'bbox', 'giou', 'mask' ve 'dice'.

use_fl bool

S─▒n─▒fland─▒rma maliyeti hesaplamas─▒ i├žin Odak Kayb─▒n─▒n kullan─▒l─▒p kullan─▒lmayaca─č─▒n─▒ belirtir.

with_mask bool

Modelin maske tahminleri yap─▒p yapmad─▒─č─▒n─▒ g├Âsterir.

num_sample_points int

Maske maliyeti hesaplamas─▒nda kullan─▒lan ├Ârnek nokta say─▒s─▒.

alpha float

Odak Kayb─▒ hesaplamas─▒nda alfa fakt├Âr├╝.

gamma float

Odak Kayb─▒ hesaplamas─▒nda gama fakt├Âr├╝.

Y├Ântemler:

─░sim A├ž─▒klama
forward

Hesaplar Bir parti i├žin tahminler ve temel ger├žekler aras─▒ndaki atama.

_cost_mask

Maskeler tahmin ediliyorsa maske maliyetini ve zar maliyetini hesaplar.

Kaynak kodu ultralytics/models/utils/ops.py
class HungarianMatcher(nn.Module):
    """
    A module implementing the HungarianMatcher, which is a differentiable module to solve the assignment problem in an
    end-to-end fashion.

    HungarianMatcher performs optimal assignment over the predicted and ground truth bounding boxes using a cost
    function that considers classification scores, bounding box coordinates, and optionally, mask predictions.

    Attributes:
        cost_gain (dict): Dictionary of cost coefficients: 'class', 'bbox', 'giou', 'mask', and 'dice'.
        use_fl (bool): Indicates whether to use Focal Loss for the classification cost calculation.
        with_mask (bool): Indicates whether the model makes mask predictions.
        num_sample_points (int): The number of sample points used in mask cost calculation.
        alpha (float): The alpha factor in Focal Loss calculation.
        gamma (float): The gamma factor in Focal Loss calculation.

    Methods:
        forward(pred_bboxes, pred_scores, gt_bboxes, gt_cls, gt_groups, masks=None, gt_mask=None): Computes the
            assignment between predictions and ground truths for a batch.
        _cost_mask(bs, num_gts, masks=None, gt_mask=None): Computes the mask cost and dice cost if masks are predicted.
    """

    def __init__(self, cost_gain=None, use_fl=True, with_mask=False, num_sample_points=12544, alpha=0.25, gamma=2.0):
        """Initializes HungarianMatcher with cost coefficients, Focal Loss, mask prediction, sample points, and alpha
        gamma factors.
        """
        super().__init__()
        if cost_gain is None:
            cost_gain = {"class": 1, "bbox": 5, "giou": 2, "mask": 1, "dice": 1}
        self.cost_gain = cost_gain
        self.use_fl = use_fl
        self.with_mask = with_mask
        self.num_sample_points = num_sample_points
        self.alpha = alpha
        self.gamma = gamma

    def forward(self, pred_bboxes, pred_scores, gt_bboxes, gt_cls, gt_groups, masks=None, gt_mask=None):
        """
        Forward pass for HungarianMatcher. This function computes costs based on prediction and ground truth
        (classification cost, L1 cost between boxes and GIoU cost between boxes) and finds the optimal matching between
        predictions and ground truth based on these costs.

        Args:
            pred_bboxes (Tensor): Predicted bounding boxes with shape [batch_size, num_queries, 4].
            pred_scores (Tensor): Predicted scores with shape [batch_size, num_queries, num_classes].
            gt_cls (torch.Tensor): Ground truth classes with shape [num_gts, ].
            gt_bboxes (torch.Tensor): Ground truth bounding boxes with shape [num_gts, 4].
            gt_groups (List[int]): List of length equal to batch size, containing the number of ground truths for
                each image.
            masks (Tensor, optional): Predicted masks with shape [batch_size, num_queries, height, width].
                Defaults to None.
            gt_mask (List[Tensor], optional): List of ground truth masks, each with shape [num_masks, Height, Width].
                Defaults to None.

        Returns:
            (List[Tuple[Tensor, Tensor]]): A list of size batch_size, each element is a tuple (index_i, index_j), where:
                - index_i is the tensor of indices of the selected predictions (in order)
                - index_j is the tensor of indices of the corresponding selected ground truth targets (in order)
                For each batch element, it holds:
                    len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
        """

        bs, nq, nc = pred_scores.shape

        if sum(gt_groups) == 0:
            return [(torch.tensor([], dtype=torch.long), torch.tensor([], dtype=torch.long)) for _ in range(bs)]

        # We flatten to compute the cost matrices in a batch
        # [batch_size * num_queries, num_classes]
        pred_scores = pred_scores.detach().view(-1, nc)
        pred_scores = F.sigmoid(pred_scores) if self.use_fl else F.softmax(pred_scores, dim=-1)
        # [batch_size * num_queries, 4]
        pred_bboxes = pred_bboxes.detach().view(-1, 4)

        # Compute the classification cost
        pred_scores = pred_scores[:, gt_cls]
        if self.use_fl:
            neg_cost_class = (1 - self.alpha) * (pred_scores**self.gamma) * (-(1 - pred_scores + 1e-8).log())
            pos_cost_class = self.alpha * ((1 - pred_scores) ** self.gamma) * (-(pred_scores + 1e-8).log())
            cost_class = pos_cost_class - neg_cost_class
        else:
            cost_class = -pred_scores

        # Compute the L1 cost between boxes
        cost_bbox = (pred_bboxes.unsqueeze(1) - gt_bboxes.unsqueeze(0)).abs().sum(-1)  # (bs*num_queries, num_gt)

        # Compute the GIoU cost between boxes, (bs*num_queries, num_gt)
        cost_giou = 1.0 - bbox_iou(pred_bboxes.unsqueeze(1), gt_bboxes.unsqueeze(0), xywh=True, GIoU=True).squeeze(-1)

        # Final cost matrix
        C = (
            self.cost_gain["class"] * cost_class
            + self.cost_gain["bbox"] * cost_bbox
            + self.cost_gain["giou"] * cost_giou
        )
        # Compute the mask cost and dice cost
        if self.with_mask:
            C += self._cost_mask(bs, gt_groups, masks, gt_mask)

        # Set invalid values (NaNs and infinities) to 0 (fixes ValueError: matrix contains invalid numeric entries)
        C[C.isnan() | C.isinf()] = 0.0

        C = C.view(bs, nq, -1).cpu()
        indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(gt_groups, -1))]
        gt_groups = torch.as_tensor([0, *gt_groups[:-1]]).cumsum_(0)  # (idx for queries, idx for gt)
        return [
            (torch.tensor(i, dtype=torch.long), torch.tensor(j, dtype=torch.long) + gt_groups[k])
            for k, (i, j) in enumerate(indices)
        ]

__init__(cost_gain=None, use_fl=True, with_mask=False, num_sample_points=12544, alpha=0.25, gamma=2.0)

HungarianMatcher'─▒ maliyet katsay─▒lar─▒, Odak Kayb─▒, maske tahmini, ├Ârnek noktalar─▒ ve alfa ile ba┼člat─▒r gama fakt├Ârleri.

Kaynak kodu ultralytics/models/utils/ops.py
def __init__(self, cost_gain=None, use_fl=True, with_mask=False, num_sample_points=12544, alpha=0.25, gamma=2.0):
    """Initializes HungarianMatcher with cost coefficients, Focal Loss, mask prediction, sample points, and alpha
    gamma factors.
    """
    super().__init__()
    if cost_gain is None:
        cost_gain = {"class": 1, "bbox": 5, "giou": 2, "mask": 1, "dice": 1}
    self.cost_gain = cost_gain
    self.use_fl = use_fl
    self.with_mask = with_mask
    self.num_sample_points = num_sample_points
    self.alpha = alpha
    self.gamma = gamma

forward(pred_bboxes, pred_scores, gt_bboxes, gt_cls, gt_groups, masks=None, gt_mask=None)

HungarianMatcher i├žin ileri ge├ži┼č. Bu i┼člev, tahmin ve temel ger├že─če dayal─▒ maliyetleri hesaplar (s─▒n─▒fland─▒rma maliyeti, kutular aras─▒ndaki L1 maliyeti ve kutular aras─▒ndaki GIoU maliyeti) aras─▒nda en uygun e┼čle┼čmeyi bulur. Bu maliyetlere dayal─▒ tahminler ve temel ger├žekler.

Parametreler:

─░sim Tip A├ž─▒klama Varsay─▒lan
pred_bboxes Tensor

Batch_size, num_queries, 4] ┼čeklinde tahmin edilen s─▒n─▒rlay─▒c─▒ kutular.

gerekli
pred_scores Tensor

┼×ekil [batch_size, num_queries, num_classes] ile tahmin edilen puanlar.

gerekli
gt_cls Tensor

┼×ekil [num_gts, ] ile temel do─čruluk s─▒n─▒flar─▒.

gerekli
gt_bboxes Tensor

Num_gts, 4] ┼čeklinde zemin ger├že─či s─▒n─▒rlay─▒c─▒ kutular─▒.

gerekli
gt_groups List[int]

i├žin zemin ger├žeklerinin say─▒s─▒n─▒ i├žeren, parti boyutuna e┼čit uzunlukta liste her bir resim.

gerekli
masks Tensor

Batch_size, num_queries, height, width] ┼čeklinde ├Âng├Âr├╝len maskeler. Varsay─▒lan de─čer Yok'tur.

None
gt_mask List[Tensor]

Her biri [num_masks, Height, Width] ┼čeklinde olan temel ger├žek maskelerinin listesi. Varsay─▒lan de─čer Yok'tur.

None

─░ade:

Tip A├ž─▒klama
List[Tuple[Tensor, Tensor]]

batch_size boyutunda bir liste, her eleman bir tuple (index_i, index_j), burada: - index_i, se├žilen tahminlerin indekslerinin tensor adresidir (s─▒rayla) - index_j, ilgili se├žilmi┼č yer ger├že─či hedeflerinin indekslerinin tensor adresidir (s─▒rayla) Her bir y─▒─č─▒n eleman─▒ i├žin bu ge├žerlidir: len(index_i) = len(index_j) = min(num_queries, num_target_boxes)

Kaynak kodu ultralytics/models/utils/ops.py
def forward(self, pred_bboxes, pred_scores, gt_bboxes, gt_cls, gt_groups, masks=None, gt_mask=None):
    """
    Forward pass for HungarianMatcher. This function computes costs based on prediction and ground truth
    (classification cost, L1 cost between boxes and GIoU cost between boxes) and finds the optimal matching between
    predictions and ground truth based on these costs.

    Args:
        pred_bboxes (Tensor): Predicted bounding boxes with shape [batch_size, num_queries, 4].
        pred_scores (Tensor): Predicted scores with shape [batch_size, num_queries, num_classes].
        gt_cls (torch.Tensor): Ground truth classes with shape [num_gts, ].
        gt_bboxes (torch.Tensor): Ground truth bounding boxes with shape [num_gts, 4].
        gt_groups (List[int]): List of length equal to batch size, containing the number of ground truths for
            each image.
        masks (Tensor, optional): Predicted masks with shape [batch_size, num_queries, height, width].
            Defaults to None.
        gt_mask (List[Tensor], optional): List of ground truth masks, each with shape [num_masks, Height, Width].
            Defaults to None.

    Returns:
        (List[Tuple[Tensor, Tensor]]): A list of size batch_size, each element is a tuple (index_i, index_j), where:
            - index_i is the tensor of indices of the selected predictions (in order)
            - index_j is the tensor of indices of the corresponding selected ground truth targets (in order)
            For each batch element, it holds:
                len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
    """

    bs, nq, nc = pred_scores.shape

    if sum(gt_groups) == 0:
        return [(torch.tensor([], dtype=torch.long), torch.tensor([], dtype=torch.long)) for _ in range(bs)]

    # We flatten to compute the cost matrices in a batch
    # [batch_size * num_queries, num_classes]
    pred_scores = pred_scores.detach().view(-1, nc)
    pred_scores = F.sigmoid(pred_scores) if self.use_fl else F.softmax(pred_scores, dim=-1)
    # [batch_size * num_queries, 4]
    pred_bboxes = pred_bboxes.detach().view(-1, 4)

    # Compute the classification cost
    pred_scores = pred_scores[:, gt_cls]
    if self.use_fl:
        neg_cost_class = (1 - self.alpha) * (pred_scores**self.gamma) * (-(1 - pred_scores + 1e-8).log())
        pos_cost_class = self.alpha * ((1 - pred_scores) ** self.gamma) * (-(pred_scores + 1e-8).log())
        cost_class = pos_cost_class - neg_cost_class
    else:
        cost_class = -pred_scores

    # Compute the L1 cost between boxes
    cost_bbox = (pred_bboxes.unsqueeze(1) - gt_bboxes.unsqueeze(0)).abs().sum(-1)  # (bs*num_queries, num_gt)

    # Compute the GIoU cost between boxes, (bs*num_queries, num_gt)
    cost_giou = 1.0 - bbox_iou(pred_bboxes.unsqueeze(1), gt_bboxes.unsqueeze(0), xywh=True, GIoU=True).squeeze(-1)

    # Final cost matrix
    C = (
        self.cost_gain["class"] * cost_class
        + self.cost_gain["bbox"] * cost_bbox
        + self.cost_gain["giou"] * cost_giou
    )
    # Compute the mask cost and dice cost
    if self.with_mask:
        C += self._cost_mask(bs, gt_groups, masks, gt_mask)

    # Set invalid values (NaNs and infinities) to 0 (fixes ValueError: matrix contains invalid numeric entries)
    C[C.isnan() | C.isinf()] = 0.0

    C = C.view(bs, nq, -1).cpu()
    indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(gt_groups, -1))]
    gt_groups = torch.as_tensor([0, *gt_groups[:-1]]).cumsum_(0)  # (idx for queries, idx for gt)
    return [
        (torch.tensor(i, dtype=torch.long), torch.tensor(j, dtype=torch.long) + gt_groups[k])
        for k, (i, j) in enumerate(indices)
    ]



ultralytics.models.utils.ops.get_cdn_group(batch, num_classes, num_queries, class_embed, num_dn=100, cls_noise_ratio=0.5, box_noise_scale=1.0, training=False)

Kontrastl─▒ denoising e─čitim grubu al─▒n. Bu fonksiyon, pozitif ve negatif kontrastl─▒ bir denoising e─čitim grubu olu┼čturur. ve temel ger├žeklerden (gt) negatif ├Ârnekler. S─▒n─▒f etiketlerine ve s─▒n─▒rlay─▒c─▒ kutu koordinatlar─▒na g├╝r├╝lt├╝ uygular, ve de─či┼čtirilmi┼č etiketleri, s─▒n─▒rlay─▒c─▒ kutular─▒, dikkat maskesini ve meta bilgilerini d├Ând├╝r├╝r.

Parametreler:

─░sim Tip A├ž─▒klama Varsay─▒lan
batch dict

'gt_cls' (torch.Tensor with shape [num_gts, ]), 'gt_bboxes' i├žeren bir dict (torch.Tensor with shape [num_gts, 4]), 'gt_groups' (List(int)) parti boyutu uzunlu─čunda bir listedir her bir g├Âr├╝nt├╝n├╝n gts say─▒s─▒n─▒ g├Âsterir.

gerekli
num_classes int

S─▒n─▒f say─▒s─▒.

gerekli
num_queries int

Sorgu say─▒s─▒.

gerekli
class_embed Tensor

S─▒n─▒f etiketlerini g├Âmme uzay─▒na e┼člemek i├žin a─č─▒rl─▒klar─▒ g├Âmme.

gerekli
num_dn int

Denoising say─▒s─▒. Varsay─▒lan de─čer 100'd├╝r.

100
cls_noise_ratio float

S─▒n─▒f etiketleri i├žin g├╝r├╝lt├╝ oran─▒. Varsay─▒lan de─čer 0,5'tir.

0.5
box_noise_scale float

S─▒n─▒rlay─▒c─▒ kutu koordinatlar─▒ i├žin g├╝r├╝lt├╝ ├Âl├že─či. Varsay─▒lan de─čer 1,0'd─▒r.

1.0
training bool

E─čer e─čitim modundaysa. Varsay─▒lan de─čer False'dir.

False

─░ade:

Tip A├ž─▒klama
Tuple[Optional[Tensor], Optional[Tensor], Optional[Tensor], Optional[Dict]]

De─či┼čtirilmi┼č s─▒n─▒f kat─▒┼čt─▒rmalar─▒, s─▒n─▒rlay─▒c─▒ kutular, dikkat maskesi ve denoising i├žin meta bilgiler. E─čitim modunda de─čilse veya 'num_dn' 0'dan k├╝├ž├╝k veya 0'a e┼čitse, fonksiyon tuple'daki t├╝m elemanlar i├žin None d├Ând├╝r├╝r.

Kaynak kodu ultralytics/models/utils/ops.py
def get_cdn_group(
    batch, num_classes, num_queries, class_embed, num_dn=100, cls_noise_ratio=0.5, box_noise_scale=1.0, training=False
):
    """
    Get contrastive denoising training group. This function creates a contrastive denoising training group with positive
    and negative samples from the ground truths (gt). It applies noise to the class labels and bounding box coordinates,
    and returns the modified labels, bounding boxes, attention mask and meta information.

    Args:
        batch (dict): A dict that includes 'gt_cls' (torch.Tensor with shape [num_gts, ]), 'gt_bboxes'
            (torch.Tensor with shape [num_gts, 4]), 'gt_groups' (List(int)) which is a list of batch size length
            indicating the number of gts of each image.
        num_classes (int): Number of classes.
        num_queries (int): Number of queries.
        class_embed (torch.Tensor): Embedding weights to map class labels to embedding space.
        num_dn (int, optional): Number of denoising. Defaults to 100.
        cls_noise_ratio (float, optional): Noise ratio for class labels. Defaults to 0.5.
        box_noise_scale (float, optional): Noise scale for bounding box coordinates. Defaults to 1.0.
        training (bool, optional): If it's in training mode. Defaults to False.

    Returns:
        (Tuple[Optional[Tensor], Optional[Tensor], Optional[Tensor], Optional[Dict]]): The modified class embeddings,
            bounding boxes, attention mask and meta information for denoising. If not in training mode or 'num_dn'
            is less than or equal to 0, the function returns None for all elements in the tuple.
    """

    if (not training) or num_dn <= 0:
        return None, None, None, None
    gt_groups = batch["gt_groups"]
    total_num = sum(gt_groups)
    max_nums = max(gt_groups)
    if max_nums == 0:
        return None, None, None, None

    num_group = num_dn // max_nums
    num_group = 1 if num_group == 0 else num_group
    # Pad gt to max_num of a batch
    bs = len(gt_groups)
    gt_cls = batch["cls"]  # (bs*num, )
    gt_bbox = batch["bboxes"]  # bs*num, 4
    b_idx = batch["batch_idx"]

    # Each group has positive and negative queries.
    dn_cls = gt_cls.repeat(2 * num_group)  # (2*num_group*bs*num, )
    dn_bbox = gt_bbox.repeat(2 * num_group, 1)  # 2*num_group*bs*num, 4
    dn_b_idx = b_idx.repeat(2 * num_group).view(-1)  # (2*num_group*bs*num, )

    # Positive and negative mask
    # (bs*num*num_group, ), the second total_num*num_group part as negative samples
    neg_idx = torch.arange(total_num * num_group, dtype=torch.long, device=gt_bbox.device) + num_group * total_num

    if cls_noise_ratio > 0:
        # Half of bbox prob
        mask = torch.rand(dn_cls.shape) < (cls_noise_ratio * 0.5)
        idx = torch.nonzero(mask).squeeze(-1)
        # Randomly put a new one here
        new_label = torch.randint_like(idx, 0, num_classes, dtype=dn_cls.dtype, device=dn_cls.device)
        dn_cls[idx] = new_label

    if box_noise_scale > 0:
        known_bbox = xywh2xyxy(dn_bbox)

        diff = (dn_bbox[..., 2:] * 0.5).repeat(1, 2) * box_noise_scale  # 2*num_group*bs*num, 4

        rand_sign = torch.randint_like(dn_bbox, 0, 2) * 2.0 - 1.0
        rand_part = torch.rand_like(dn_bbox)
        rand_part[neg_idx] += 1.0
        rand_part *= rand_sign
        known_bbox += rand_part * diff
        known_bbox.clip_(min=0.0, max=1.0)
        dn_bbox = xyxy2xywh(known_bbox)
        dn_bbox = torch.logit(dn_bbox, eps=1e-6)  # inverse sigmoid

    num_dn = int(max_nums * 2 * num_group)  # total denoising queries
    # class_embed = torch.cat([class_embed, torch.zeros([1, class_embed.shape[-1]], device=class_embed.device)])
    dn_cls_embed = class_embed[dn_cls]  # bs*num * 2 * num_group, 256
    padding_cls = torch.zeros(bs, num_dn, dn_cls_embed.shape[-1], device=gt_cls.device)
    padding_bbox = torch.zeros(bs, num_dn, 4, device=gt_bbox.device)

    map_indices = torch.cat([torch.tensor(range(num), dtype=torch.long) for num in gt_groups])
    pos_idx = torch.stack([map_indices + max_nums * i for i in range(num_group)], dim=0)

    map_indices = torch.cat([map_indices + max_nums * i for i in range(2 * num_group)])
    padding_cls[(dn_b_idx, map_indices)] = dn_cls_embed
    padding_bbox[(dn_b_idx, map_indices)] = dn_bbox

    tgt_size = num_dn + num_queries
    attn_mask = torch.zeros([tgt_size, tgt_size], dtype=torch.bool)
    # Match query cannot see the reconstruct
    attn_mask[num_dn:, :num_dn] = True
    # Reconstruct cannot see each other
    for i in range(num_group):
        if i == 0:
            attn_mask[max_nums * 2 * i : max_nums * 2 * (i + 1), max_nums * 2 * (i + 1) : num_dn] = True
        if i == num_group - 1:
            attn_mask[max_nums * 2 * i : max_nums * 2 * (i + 1), : max_nums * i * 2] = True
        else:
            attn_mask[max_nums * 2 * i : max_nums * 2 * (i + 1), max_nums * 2 * (i + 1) : num_dn] = True
            attn_mask[max_nums * 2 * i : max_nums * 2 * (i + 1), : max_nums * 2 * i] = True
    dn_meta = {
        "dn_pos_idx": [p.reshape(-1) for p in pos_idx.cpu().split(list(gt_groups), dim=1)],
        "dn_num_group": num_group,
        "dn_num_split": [num_dn, num_queries],
    }

    return (
        padding_cls.to(class_embed.device),
        padding_bbox.to(class_embed.device),
        attn_mask.to(class_embed.device),
        dn_meta,
    )





Created 2023-11-12, Updated 2024-06-02
Authors: glenn-jocher (5), Burhan-Q (1), Laughing-q (1)