Reference for ultralytics/nn/modules/head.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/modules/head.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!
ultralytics.nn.modules.head.Detect
Detect(nc=80, ch=())
Bases: Module
YOLO Detect head for detection models.
Source code in ultralytics/nn/modules/head.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
bias_init
bias_init()
Initialize Detect() biases, WARNING: requires stride availability.
Source code in ultralytics/nn/modules/head.py
146 147 148 149 150 151 152 153 154 155 156 157 |
|
decode_bboxes
decode_bboxes(bboxes, anchors, xywh=True)
Decode bounding boxes.
Source code in ultralytics/nn/modules/head.py
159 160 161 |
|
forward
forward(x)
Concatenates and returns predicted bounding boxes and class probabilities.
Source code in ultralytics/nn/modules/head.py
67 68 69 70 71 72 73 74 75 76 77 |
|
forward_end2end
forward_end2end(x)
Performs forward pass of the v10Detect module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
List[Tensor]
|
Input feature maps from different levels. |
required |
Returns:
Type | Description |
---|---|
dict | tuple
|
|
Source code in ultralytics/nn/modules/head.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
|
postprocess
staticmethod
postprocess(preds: Tensor, max_det: int, nc: int = 80)
Post-processes YOLO model predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preds
|
Tensor
|
Raw predictions with shape (batch_size, num_anchors, 4 + nc) with last dimension format [x, y, w, h, class_probs]. |
required |
max_det
|
int
|
Maximum detections per image. |
required |
nc
|
int
|
Number of classes. Default: 80. |
80
|
Returns:
Type | Description |
---|---|
Tensor
|
Processed predictions with shape (batch_size, min(max_det, num_anchors), 6) and last dimension format [x, y, w, h, max_class_prob, class_index]. |
Source code in ultralytics/nn/modules/head.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|
ultralytics.nn.modules.head.Segment
Segment(nc=80, nm=32, npr=256, ch=())
Bases: Detect
YOLO Segment head for segmentation models.
Source code in ultralytics/nn/modules/head.py
191 192 193 194 195 196 197 198 199 |
|
forward
forward(x)
Return model outputs and mask coefficients if training, otherwise return outputs and mask coefficients.
Source code in ultralytics/nn/modules/head.py
201 202 203 204 205 206 207 208 209 210 |
|
ultralytics.nn.modules.head.OBB
OBB(nc=80, ne=1, ch=())
Bases: Detect
YOLO OBB detection head for detection with rotation models.
Source code in ultralytics/nn/modules/head.py
216 217 218 219 220 221 222 |
|
decode_bboxes
decode_bboxes(bboxes, anchors)
Decode rotated bounding boxes.
Source code in ultralytics/nn/modules/head.py
238 239 240 |
|
forward
forward(x)
Concatenates and returns predicted bounding boxes and class probabilities.
Source code in ultralytics/nn/modules/head.py
224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
ultralytics.nn.modules.head.Pose
Pose(nc=80, kpt_shape=(17, 3), ch=())
Bases: Detect
YOLO Pose head for keypoints models.
Source code in ultralytics/nn/modules/head.py
246 247 248 249 250 251 252 253 |
|
forward
forward(x)
Perform forward pass through YOLO model and return predictions.
Source code in ultralytics/nn/modules/head.py
255 256 257 258 259 260 261 262 263 |
|
kpts_decode
kpts_decode(bs, kpts)
Decodes keypoints.
Source code in ultralytics/nn/modules/head.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
|
ultralytics.nn.modules.head.Classify
Classify(c1, c2, k=1, s=1, p=None, g=1)
Bases: Module
YOLO classification head, i.e. x(b,c1,20,20) to x(b,c2).
Source code in ultralytics/nn/modules/head.py
300 301 302 303 304 305 306 307 |
|
forward
forward(x)
Performs a forward pass of the YOLO model on input image data.
Source code in ultralytics/nn/modules/head.py
309 310 311 312 313 314 315 316 317 |
|
ultralytics.nn.modules.head.WorldDetect
WorldDetect(nc=80, embed=512, with_bn=False, ch=())
Bases: Detect
Head for integrating YOLO detection models with semantic understanding from text embeddings.
Source code in ultralytics/nn/modules/head.py
323 324 325 326 327 328 |
|
bias_init
bias_init()
Initialize Detect() biases, WARNING: requires stride availability.
Source code in ultralytics/nn/modules/head.py
340 341 342 343 344 345 346 |
|
forward
forward(x, text)
Concatenates and returns predicted bounding boxes and class probabilities.
Source code in ultralytics/nn/modules/head.py
330 331 332 333 334 335 336 337 338 |
|
ultralytics.nn.modules.head.LRPCHead
LRPCHead(vocab, pf, loc, enabled=True)
Bases: Module
Lightweight Region Proposal and Classification Head for efficient object detection.
Source code in ultralytics/nn/modules/head.py
353 354 355 356 357 358 359 |
|
conv2linear
conv2linear(conv)
Convert a 1x1 convolutional layer to a linear layer.
Source code in ultralytics/nn/modules/head.py
361 362 363 364 365 366 367 |
|
forward
forward(cls_feat, loc_feat, conf)
Process classification and localization features to generate detection proposals.
Source code in ultralytics/nn/modules/head.py
369 370 371 372 373 374 375 376 377 378 379 380 381 382 |
|
ultralytics.nn.modules.head.YOLOEDetect
YOLOEDetect(nc=80, embed=512, with_bn=False, ch=())
Bases: Detect
Head for integrating YOLO detection models with semantic understanding from text embeddings.
Source code in ultralytics/nn/modules/head.py
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 |
|
bias_init
bias_init()
Initialize biases for detection heads.
Source code in ultralytics/nn/modules/head.py
525 526 527 528 529 530 531 532 533 534 |
|
forward
forward(x, cls_pe, return_mask=False)
Process features with class prompt embeddings to generate detections.
Source code in ultralytics/nn/modules/head.py
513 514 515 516 517 518 519 520 521 522 523 |
|
forward_lrpc
forward_lrpc(x, return_mask=False)
Process features with fused text embeddings to generate detections for prompt-free model.
Source code in ultralytics/nn/modules/head.py
475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 |
|
fuse
fuse(txt_feats)
Fuse text features with model weights for efficient inference.
Source code in ultralytics/nn/modules/head.py
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 |
|
get_tpe
get_tpe(tpe)
Get text prompt embeddings with normalization.
Source code in ultralytics/nn/modules/head.py
462 463 464 |
|
get_vpe
get_vpe(x, vpe)
Get visual prompt embeddings with spatial awareness.
Source code in ultralytics/nn/modules/head.py
466 467 468 469 470 471 472 473 |
|
ultralytics.nn.modules.head.YOLOESegment
YOLOESegment(nc=80, nm=32, npr=256, embed=512, with_bn=False, ch=())
Bases: YOLOEDetect
YOLO segmentation head with text embedding capabilities.
Source code in ultralytics/nn/modules/head.py
540 541 542 543 544 545 546 547 548 |
|
forward
forward(x, text)
Return model outputs and mask coefficients if training, otherwise return outputs and mask coefficients.
Source code in ultralytics/nn/modules/head.py
550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 |
|
ultralytics.nn.modules.head.RTDETRDecoder
RTDETRDecoder(
nc=80,
ch=(512, 1024, 2048),
hd=256,
nq=300,
ndp=4,
nh=8,
ndl=6,
d_ffn=1024,
dropout=0.0,
act=nn.ReLU(),
eval_idx=-1,
nd=100,
label_noise_ratio=0.5,
box_noise_scale=1.0,
learnt_init_query=False,
)
Bases: Module
Real-Time Deformable Transformer Decoder (RTDETRDecoder) module for object detection.
This decoder module utilizes Transformer architecture along with deformable convolutions to predict bounding boxes and class labels for objects in an image. It integrates features from multiple layers and runs through a series of Transformer decoder layers to output the final predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nc
|
int
|
Number of classes. Default is 80. |
80
|
ch
|
tuple
|
Channels in the backbone feature maps. Default is (512, 1024, 2048). |
(512, 1024, 2048)
|
hd
|
int
|
Dimension of hidden layers. Default is 256. |
256
|
nq
|
int
|
Number of query points. Default is 300. |
300
|
ndp
|
int
|
Number of decoder points. Default is 4. |
4
|
nh
|
int
|
Number of heads in multi-head attention. Default is 8. |
8
|
ndl
|
int
|
Number of decoder layers. Default is 6. |
6
|
d_ffn
|
int
|
Dimension of the feed-forward networks. Default is 1024. |
1024
|
dropout
|
float
|
Dropout rate. Default is 0.0. |
0.0
|
act
|
Module
|
Activation function. Default is nn.ReLU. |
ReLU()
|
eval_idx
|
int
|
Evaluation index. Default is -1. |
-1
|
nd
|
int
|
Number of denoising. Default is 100. |
100
|
label_noise_ratio
|
float
|
Label noise ratio. Default is 0.5. |
0.5
|
box_noise_scale
|
float
|
Box noise scale. Default is 1.0. |
1.0
|
learnt_init_query
|
bool
|
Whether to learn initial query embeddings. Default is False. |
False
|
Source code in ultralytics/nn/modules/head.py
583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 |
|
forward
forward(x, batch=None)
Runs the forward pass of the module, returning bounding box and classification scores for the input.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
List[Tensor]
|
List of feature maps from the backbone. |
required |
batch
|
dict
|
Batch information for training. |
None
|
Returns:
Type | Description |
---|---|
tuple | Tensor
|
During training, returns a tuple of bounding boxes, scores, and other metadata. During inference, returns a tensor of shape (bs, 300, 4+nc) containing bounding boxes and class scores. |
Source code in ultralytics/nn/modules/head.py
662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 |
|
ultralytics.nn.modules.head.v10Detect
v10Detect(nc=80, ch=())
Bases: Detect
v10 Detection head from https://arxiv.org/pdf/2405.14458.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nc
|
int
|
Number of classes. |
80
|
ch
|
tuple
|
Tuple of channel sizes. |
()
|
Attributes:
Name | Type | Description |
---|---|---|
max_det |
int
|
Maximum number of detections. |
Methods:
Name | Description |
---|---|
forward |
Performs forward pass of the v10Detect module. |
bias_init |
Initializes biases of the Detect module. |
Source code in ultralytics/nn/modules/head.py
863 864 865 866 867 868 869 870 871 872 873 874 875 876 |
|
fuse
fuse()
Removes the one2many head.
Source code in ultralytics/nn/modules/head.py
878 879 880 |
|