Reference for ultralytics/nn/modules/block.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/modules/block.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!
ultralytics.nn.modules.block.DFL
Bases: Module
Integral module of Distribution Focal Loss (DFL).
Proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391
Source code in ultralytics/nn/modules/block.py
forward
Applies a transformer layer on input tensor 'x' and returns a tensor.
ultralytics.nn.modules.block.Proto
Bases: Module
YOLOv8 mask Proto module for segmentation models.
Input arguments are ch_in, number of protos, number of masks.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.HGStem
Bases: Module
StemBlock of PPHGNetV2 with 5 convolutions and one maxpool2d.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
Source code in ultralytics/nn/modules/block.py
forward
Forward pass of a PPHGNetV2 backbone layer.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.HGBlock
Bases: Module
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
Source code in ultralytics/nn/modules/block.py
forward
ultralytics.nn.modules.block.SPP
Bases: Module
Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729.
Source code in ultralytics/nn/modules/block.py
forward
ultralytics.nn.modules.block.SPPF
Bases: Module
Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher.
This module is equivalent to SPP(k=(5, 9, 13)).
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C1
Bases: Module
CSP Bottleneck with 1 convolution.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C2
Bases: Module
CSP Bottleneck with 2 convolutions.
Source code in ultralytics/nn/modules/block.py
forward
ultralytics.nn.modules.block.C2f
Bases: Module
Faster Implementation of CSP Bottleneck with 2 convolutions.
Source code in ultralytics/nn/modules/block.py
forward
forward_split
Forward pass using split() instead of chunk().
ultralytics.nn.modules.block.C3
Bases: Module
CSP Bottleneck with 3 convolutions.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3x
ultralytics.nn.modules.block.RepC3
Bases: Module
Rep C3.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3TR
ultralytics.nn.modules.block.C3Ghost
ultralytics.nn.modules.block.GhostBottleneck
Bases: Module
Ghost Bottleneck https://github.com/huawei-noah/ghostnet.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.Bottleneck
Bases: Module
Standard bottleneck.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.BottleneckCSP
Bases: Module
CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks.
Source code in ultralytics/nn/modules/block.py
forward
ultralytics.nn.modules.block.ResNetBlock
Bases: Module
ResNet block with standard convolution layers.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ResNetLayer
Bases: Module
ResNet layer with multiple ResNet blocks.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.MaxSigmoidAttnBlock
Bases: Module
Max Sigmoid attention block.
Source code in ultralytics/nn/modules/block.py
forward
Forward process.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C2fAttn
Bases: Module
C2f module with an additional attn module.
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through C2f layer.
forward_split
Forward pass using split() instead of chunk().
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ImagePoolingAttn
Bases: Module
ImagePoolingAttn: Enhance the text embeddings with image-aware information.
Source code in ultralytics/nn/modules/block.py
forward
Executes attention mechanism on input tensor x and guide tensor.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ContrastiveHead
Bases: Module
Implements contrastive learning head for region-text similarity in vision-language models.
Source code in ultralytics/nn/modules/block.py
forward
Forward function of contrastive learning.
ultralytics.nn.modules.block.BNContrastiveHead
Bases: Module
Batch Norm Contrastive Head for YOLO-World using batch norm instead of l2-normalization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embed_dims
|
int
|
Embed dimensions of text and image features. |
required |
Source code in ultralytics/nn/modules/block.py
forward
Forward function of contrastive learning.
ultralytics.nn.modules.block.RepBottleneck
ultralytics.nn.modules.block.RepCSP
Bases: C3
Repeatable Cross Stage Partial Network (RepCSP) module for efficient feature extraction.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.RepNCSPELAN4
Bases: Module
CSP-ELAN.
Source code in ultralytics/nn/modules/block.py
forward
forward_split
Forward pass using split() instead of chunk().
ultralytics.nn.modules.block.ELAN1
Bases: RepNCSPELAN4
ELAN1 module with 4 convolutions.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.AConv
Bases: Module
AConv.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ADown
Bases: Module
ADown.
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through ADown layer.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.SPPELAN
Bases: Module
SPP-ELAN.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.CBLinear
Bases: Module
CBLinear.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.CBFuse
Bases: Module
CBFuse.
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through CBFuse layer.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3f
Bases: Module
Faster Implementation of CSP Bottleneck with 2 convolutions.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3k2
Bases: C2f
Faster Implementation of CSP Bottleneck with 2 convolutions.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3k
Bases: C3
C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.RepVGGDW
Bases: Module
RepVGGDW is a class that represents a depth wise separable convolutional block in RepVGG architecture.
Source code in ultralytics/nn/modules/block.py
forward
Performs a forward pass of the RepVGGDW block.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after applying the depth wise separable convolution. |
Source code in ultralytics/nn/modules/block.py
forward_fuse
Performs a forward pass of the RepVGGDW block without fusing the convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after applying the depth wise separable convolution. |
Source code in ultralytics/nn/modules/block.py
fuse
Fuses the convolutional layers in the RepVGGDW block.
This method fuses the convolutional layers and updates the weights and biases accordingly.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.CIB
Bases: Module
Conditional Identity Block (CIB) module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Number of input channels. |
required |
c2
|
int
|
Number of output channels. |
required |
shortcut
|
bool
|
Whether to add a shortcut connection. Defaults to True. |
True
|
e
|
float
|
Scaling factor for the hidden channels. Defaults to 0.5. |
0.5
|
lk
|
bool
|
Whether to use RepVGGDW for the third convolutional layer. Defaults to False. |
False
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass of the CIB module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor. |
ultralytics.nn.modules.block.C2fCIB
Bases: C2f
C2fCIB class represents a convolutional block with C2f and CIB modules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Number of input channels. |
required |
c2
|
int
|
Number of output channels. |
required |
n
|
int
|
Number of CIB modules to stack. Defaults to 1. |
1
|
shortcut
|
bool
|
Whether to use shortcut connection. Defaults to False. |
False
|
lk
|
bool
|
Whether to use local key connection. Defaults to False. |
False
|
g
|
int
|
Number of groups for grouped convolution. Defaults to 1. |
1
|
e
|
float
|
Expansion ratio for CIB modules. Defaults to 0.5. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.Attention
Bases: Module
Attention module that performs self-attention on the input tensor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim
|
int
|
The input tensor dimension. |
required |
num_heads
|
int
|
The number of attention heads. |
8
|
attn_ratio
|
float
|
The ratio of the attention key dimension to the head dimension. |
0.5
|
Attributes:
Name | Type | Description |
---|---|---|
num_heads |
int
|
The number of attention heads. |
head_dim |
int
|
The dimension of each attention head. |
key_dim |
int
|
The dimension of the attention key. |
scale |
float
|
The scaling factor for the attention scores. |
qkv |
Conv
|
Convolutional layer for computing the query, key, and value. |
proj |
Conv
|
Convolutional layer for projecting the attended values. |
pe |
Conv
|
Convolutional layer for positional encoding. |
Source code in ultralytics/nn/modules/block.py
forward
Forward pass of the Attention module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The output tensor after self-attention. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.PSABlock
Bases: Module
PSABlock class implementing a Position-Sensitive Attention block for neural networks.
This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers with optional shortcut connections.
Attributes:
Name | Type | Description |
---|---|---|
attn |
Attention
|
Multi-head attention module. |
ffn |
Sequential
|
Feed-forward neural network module. |
add |
bool
|
Flag indicating whether to add shortcut connections. |
Methods:
Name | Description |
---|---|
forward |
Performs a forward pass through the PSABlock, applying attention and feed-forward layers. |
Examples:
Create a PSABlock and perform a forward pass
>>> psablock = PSABlock(c=128, attn_ratio=0.5, num_heads=4, shortcut=True)
>>> input_tensor = torch.randn(1, 128, 32, 32)
>>> output_tensor = psablock(input_tensor)
Source code in ultralytics/nn/modules/block.py
forward
Executes a forward pass through PSABlock, applying attention and feed-forward layers to the input tensor.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.PSA
Bases: Module
PSA class for implementing Position-Sensitive Attention in neural networks.
This class encapsulates the functionality for applying position-sensitive attention and feed-forward networks to input tensors, enhancing feature extraction and processing capabilities.
Attributes:
Name | Type | Description |
---|---|---|
c |
int
|
Number of hidden channels after applying the initial convolution. |
cv1 |
Conv
|
1x1 convolution layer to reduce the number of input channels to 2*c. |
cv2 |
Conv
|
1x1 convolution layer to reduce the number of output channels to c. |
attn |
Attention
|
Attention module for position-sensitive attention. |
ffn |
Sequential
|
Feed-forward network for further processing. |
Methods:
Name | Description |
---|---|
forward |
Applies position-sensitive attention and feed-forward network to the input tensor. |
Examples:
Create a PSA module and apply it to an input tensor
>>> psa = PSA(c1=128, c2=128, e=0.5)
>>> input_tensor = torch.randn(1, 128, 64, 64)
>>> output_tensor = psa.forward(input_tensor)
Source code in ultralytics/nn/modules/block.py
forward
Executes forward pass in PSA module, applying attention and feed-forward layers to the input tensor.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C2PSA
Bases: Module
C2PSA module with attention mechanism for enhanced feature extraction and processing.
This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations.
Attributes:
Name | Type | Description |
---|---|---|
c |
int
|
Number of hidden channels. |
cv1 |
Conv
|
1x1 convolution layer to reduce the number of input channels to 2*c. |
cv2 |
Conv
|
1x1 convolution layer to reduce the number of output channels to c. |
m |
Sequential
|
Sequential container of PSABlock modules for attention and feed-forward operations. |
Methods:
Name | Description |
---|---|
forward |
Performs a forward pass through the C2PSA module, applying attention and feed-forward operations. |
Notes
This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules.
Examples:
>>> c2psa = C2PSA(c1=256, c2=256, n=3, e=0.5)
>>> input_tensor = torch.randn(1, 256, 64, 64)
>>> output_tensor = c2psa(input_tensor)
Source code in ultralytics/nn/modules/block.py
forward
Processes the input tensor 'x' through a series of PSA blocks and returns the transformed tensor.
ultralytics.nn.modules.block.C2fPSA
Bases: C2f
C2fPSA module with enhanced feature extraction using PSA blocks.
This class extends the C2f module by incorporating PSA blocks for improved attention mechanisms and feature extraction.
Attributes:
Name | Type | Description |
---|---|---|
c |
int
|
Number of hidden channels. |
cv1 |
Conv
|
1x1 convolution layer to reduce the number of input channels to 2*c. |
cv2 |
Conv
|
1x1 convolution layer to reduce the number of output channels to c. |
m |
ModuleList
|
List of PSA blocks for feature extraction. |
Methods:
Name | Description |
---|---|
forward |
Performs a forward pass through the C2fPSA module. |
forward_split |
Performs a forward pass using split() instead of chunk(). |
Examples:
>>> import torch
>>> from ultralytics.models.common import C2fPSA
>>> model = C2fPSA(c1=64, c2=64, n=3, e=0.5)
>>> x = torch.randn(1, 64, 128, 128)
>>> output = model(x)
>>> print(output.shape)
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.SCDown
Bases: Module
SCDown module for downsampling with separable convolutions.
This module performs downsampling using a combination of pointwise and depthwise convolutions, which helps in efficiently reducing the spatial dimensions of the input tensor while maintaining the channel information.
Attributes:
Name | Type | Description |
---|---|---|
cv1 |
Conv
|
Pointwise convolution layer that reduces the number of channels. |
cv2 |
Conv
|
Depthwise convolution layer that performs spatial downsampling. |
Methods:
Name | Description |
---|---|
forward |
Applies the SCDown module to the input tensor. |
Examples:
>>> import torch
>>> from ultralytics import SCDown
>>> model = SCDown(c1=64, c2=128, k=3, s=2)
>>> x = torch.randn(1, 64, 128, 128)
>>> y = model(x)
>>> print(y.shape)
torch.Size([1, 128, 64, 64])
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.TorchVision
Bases: Module
TorchVision module to allow loading any torchvision model.
This class provides a way to load a model from the torchvision library, optionally load pre-trained weights, and customize the model by truncating or unwrapping layers.
Attributes:
Name | Type | Description |
---|---|---|
m |
Module
|
The loaded torchvision model, possibly truncated and unwrapped. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
Output channels. |
required | |
model
|
str
|
Name of the torchvision model to load. |
required |
weights
|
str
|
Pre-trained weights to load. Default is "DEFAULT". |
'DEFAULT'
|
unwrap
|
bool
|
If True, unwraps the model to a sequential containing all but the last |
True
|
truncate
|
int
|
Number of layers to truncate from the end if |
2
|
split
|
bool
|
Returns output from intermediate child modules as list. Default is False. |
False
|