Reference for ultralytics/nn/modules/block.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/modules/block.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!
ultralytics.nn.modules.block.DFL
Bases: Module
Integral module of Distribution Focal Loss (DFL).
Proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391
Source code in ultralytics/nn/modules/block.py
forward
Apply the DFL module to input tensor and return transformed output.
ultralytics.nn.modules.block.Proto
Bases: Module
YOLOv8 mask Proto module for segmentation models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c_
|
int
|
Intermediate channels. |
256
|
c2
|
int
|
Output channels (number of protos). |
32
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.HGStem
Bases: Module
StemBlock of PPHGNetV2 with 5 convolutions and one maxpool2d.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
cm
|
int
|
Middle channels. |
required |
c2
|
int
|
Output channels. |
required |
Source code in ultralytics/nn/modules/block.py
forward
Forward pass of a PPHGNetV2 backbone layer.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.HGBlock
Bases: Module
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
cm
|
int
|
Middle channels. |
required |
c2
|
int
|
Output channels. |
required |
k
|
int
|
Kernel size. |
3
|
n
|
int
|
Number of LightConv or Conv blocks. |
6
|
lightconv
|
bool
|
Whether to use LightConv. |
False
|
shortcut
|
bool
|
Whether to use shortcut connection. |
False
|
act
|
Module
|
Activation function. |
ReLU()
|
Source code in ultralytics/nn/modules/block.py
forward
ultralytics.nn.modules.block.SPP
Bases: Module
Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
k
|
Tuple[int, int, int]
|
Kernel sizes for max pooling. |
(5, 9, 13)
|
Source code in ultralytics/nn/modules/block.py
forward
ultralytics.nn.modules.block.SPPF
Bases: Module
Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
k
|
int
|
Kernel size. |
5
|
Notes
This module is equivalent to SPP(k=(5, 9, 13)).
Source code in ultralytics/nn/modules/block.py
forward
Apply sequential pooling operations to input and return concatenated feature maps.
ultralytics.nn.modules.block.C1
Bases: Module
CSP Bottleneck with 1 convolution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of convolutions. |
1
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C2
Bases: Module
CSP Bottleneck with 2 convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Bottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
forward
ultralytics.nn.modules.block.C2f
Bases: Module
Faster Implementation of CSP Bottleneck with 2 convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Bottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
False
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
forward
forward_split
Forward pass using split() instead of chunk().
ultralytics.nn.modules.block.C3
Bases: Module
CSP Bottleneck with 3 convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Bottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3x
Bases: C3
C3 module with cross-convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Bottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.RepC3
Bases: Module
Rep C3.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of RepConv blocks. |
3
|
e
|
float
|
Expansion ratio. |
1.0
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3TR
Bases: C3
C3 module with TransformerBlock().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Transformer blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3Ghost
Bases: C3
C3 module with GhostBottleneck().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Ghost bottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.GhostBottleneck
Bases: Module
Ghost Bottleneck https://github.com/huawei-noah/ghostnet.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
k
|
int
|
Kernel size. |
3
|
s
|
int
|
Stride. |
1
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.Bottleneck
Bases: Module
Standard bottleneck.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
shortcut
|
bool
|
Whether to use shortcut connection. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
k
|
Tuple[int, int]
|
Kernel sizes for convolutions. |
(3, 3)
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.BottleneckCSP
Bases: Module
CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Bottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
forward
ultralytics.nn.modules.block.ResNetBlock
Bases: Module
ResNet block with standard convolution layers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
s
|
int
|
Stride. |
1
|
e
|
int
|
Expansion ratio. |
4
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ResNetLayer
Bases: Module
ResNet layer with multiple ResNet blocks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
s
|
int
|
Stride. |
1
|
is_first
|
bool
|
Whether this is the first layer. |
False
|
n
|
int
|
Number of ResNet blocks. |
1
|
e
|
int
|
Expansion ratio. |
4
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.MaxSigmoidAttnBlock
Bases: Module
Max Sigmoid attention block.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
nh
|
int
|
Number of heads. |
1
|
ec
|
int
|
Embedding channels. |
128
|
gc
|
int
|
Guide channels. |
512
|
scale
|
bool
|
Whether to use learnable scale parameter. |
False
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass of MaxSigmoidAttnBlock.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
guide
|
Tensor
|
Guide tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after attention. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C2fAttn
Bases: Module
C2f module with an additional attn module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Bottleneck blocks. |
1
|
ec
|
int
|
Embedding channels for attention. |
128
|
nh
|
int
|
Number of heads for attention. |
1
|
gc
|
int
|
Guide channels for attention. |
512
|
shortcut
|
bool
|
Whether to use shortcut connections. |
False
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through C2f layer with attention.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
guide
|
Tensor
|
Guide tensor for attention. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after processing. |
Source code in ultralytics/nn/modules/block.py
forward_split
Forward pass using split() instead of chunk().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
guide
|
Tensor
|
Guide tensor for attention. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after processing. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ImagePoolingAttn
Bases: Module
ImagePoolingAttn: Enhance the text embeddings with image-aware information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ec
|
int
|
Embedding channels. |
256
|
ch
|
tuple
|
Channel dimensions for feature maps. |
()
|
ct
|
int
|
Channel dimension for text embeddings. |
512
|
nh
|
int
|
Number of attention heads. |
8
|
k
|
int
|
Kernel size for pooling. |
3
|
scale
|
bool
|
Whether to use learnable scale parameter. |
False
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass of ImagePoolingAttn.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
List[Tensor]
|
List of input feature maps. |
required |
text
|
Tensor
|
Text embeddings. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Enhanced text embeddings. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ContrastiveHead
Bases: Module
Implements contrastive learning head for region-text similarity in vision-language models.
Source code in ultralytics/nn/modules/block.py
forward
Forward function of contrastive learning.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Image features. |
required |
w
|
Tensor
|
Text features. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Similarity scores. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.BNContrastiveHead
Bases: Module
Batch Norm Contrastive Head for YOLO-World using batch norm instead of l2-normalization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embed_dims
|
int
|
Embed dimensions of text and image features. |
required |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embed_dims
|
int
|
Embedding dimensions for features. |
required |
Source code in ultralytics/nn/modules/block.py
forward
Forward function of contrastive learning with batch normalization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Image features. |
required |
w
|
Tensor
|
Text features. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Similarity scores. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.RepBottleneck
Bases: Bottleneck
Rep bottleneck.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
shortcut
|
bool
|
Whether to use shortcut connection. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
k
|
Tuple[int, int]
|
Kernel sizes for convolutions. |
(3, 3)
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.RepCSP
Bases: C3
Repeatable Cross Stage Partial Network (RepCSP) module for efficient feature extraction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of RepBottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.RepNCSPELAN4
Bases: Module
CSP-ELAN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
c3
|
int
|
Intermediate channels. |
required |
c4
|
int
|
Intermediate channels for RepCSP. |
required |
n
|
int
|
Number of RepCSP blocks. |
1
|
Source code in ultralytics/nn/modules/block.py
forward
forward_split
Forward pass using split() instead of chunk().
ultralytics.nn.modules.block.ELAN1
Bases: RepNCSPELAN4
ELAN1 module with 4 convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
c3
|
int
|
Intermediate channels. |
required |
c4
|
int
|
Intermediate channels for convolutions. |
required |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.AConv
Bases: Module
AConv.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ADown
Bases: Module
ADown.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through ADown layer.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.SPPELAN
Bases: Module
SPP-ELAN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
c3
|
int
|
Intermediate channels. |
required |
k
|
int
|
Kernel size for max pooling. |
5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.CBLinear
Bases: Module
CBLinear.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2s
|
List[int]
|
List of output channel sizes. |
required |
k
|
int
|
Kernel size. |
1
|
s
|
int
|
Stride. |
1
|
p
|
int | None
|
Padding. |
None
|
g
|
int
|
Groups. |
1
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.CBFuse
Bases: Module
CBFuse.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
List[int]
|
Indices for feature selection. |
required |
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through CBFuse layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
xs
|
List[Tensor]
|
List of input tensors. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Fused output tensor. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3f
Bases: Module
Faster Implementation of CSP Bottleneck with 2 convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Bottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
False
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3k2
Bases: C2f
Faster Implementation of CSP Bottleneck with 2 convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of blocks. |
1
|
c3k
|
bool
|
Whether to use C3k blocks. |
False
|
e
|
float
|
Expansion ratio. |
0.5
|
g
|
int
|
Groups for convolutions. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C3k
Bases: C3
C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of Bottleneck blocks. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
k
|
int
|
Kernel size. |
3
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.RepVGGDW
Bases: Module
RepVGGDW is a class that represents a depth wise separable convolutional block in RepVGG architecture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ed
|
int
|
Input and output channels. |
required |
Source code in ultralytics/nn/modules/block.py
forward
Perform a forward pass of the RepVGGDW block.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after applying the depth wise separable convolution. |
Source code in ultralytics/nn/modules/block.py
forward_fuse
Perform a forward pass of the RepVGGDW block without fusing the convolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after applying the depth wise separable convolution. |
Source code in ultralytics/nn/modules/block.py
fuse
Fuse the convolutional layers in the RepVGGDW block.
This method fuses the convolutional layers and updates the weights and biases accordingly.
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.CIB
Bases: Module
Conditional Identity Block (CIB) module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Number of input channels. |
required |
c2
|
int
|
Number of output channels. |
required |
shortcut
|
bool
|
Whether to add a shortcut connection. Defaults to True. |
True
|
e
|
float
|
Scaling factor for the hidden channels. Defaults to 0.5. |
0.5
|
lk
|
bool
|
Whether to use RepVGGDW for the third convolutional layer. Defaults to False. |
False
|
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
shortcut
|
bool
|
Whether to use shortcut connection. |
True
|
e
|
float
|
Expansion ratio. |
0.5
|
lk
|
bool
|
Whether to use RepVGGDW. |
False
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass of the CIB module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor. |
ultralytics.nn.modules.block.C2fCIB
Bases: C2f
C2fCIB class represents a convolutional block with C2f and CIB modules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Number of input channels. |
required |
c2
|
int
|
Number of output channels. |
required |
n
|
int
|
Number of CIB modules to stack. Defaults to 1. |
1
|
shortcut
|
bool
|
Whether to use shortcut connection. Defaults to False. |
False
|
lk
|
bool
|
Whether to use local key connection. Defaults to False. |
False
|
g
|
int
|
Number of groups for grouped convolution. Defaults to 1. |
1
|
e
|
float
|
Expansion ratio for CIB modules. Defaults to 0.5. |
0.5
|
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of CIB modules. |
1
|
shortcut
|
bool
|
Whether to use shortcut connection. |
False
|
lk
|
bool
|
Whether to use local key connection. |
False
|
g
|
int
|
Groups for convolutions. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.Attention
Bases: Module
Attention module that performs self-attention on the input tensor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim
|
int
|
The input tensor dimension. |
required |
num_heads
|
int
|
The number of attention heads. |
8
|
attn_ratio
|
float
|
The ratio of the attention key dimension to the head dimension. |
0.5
|
Attributes:
Name | Type | Description |
---|---|---|
num_heads |
int
|
The number of attention heads. |
head_dim |
int
|
The dimension of each attention head. |
key_dim |
int
|
The dimension of the attention key. |
scale |
float
|
The scaling factor for the attention scores. |
qkv |
Conv
|
Convolutional layer for computing the query, key, and value. |
proj |
Conv
|
Convolutional layer for projecting the attended values. |
pe |
Conv
|
Convolutional layer for positional encoding. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim
|
int
|
Input dimension. |
required |
num_heads
|
int
|
Number of attention heads. |
8
|
attn_ratio
|
float
|
Attention ratio for key dimension. |
0.5
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass of the Attention module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The output tensor after self-attention. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.PSABlock
Bases: Module
PSABlock class implementing a Position-Sensitive Attention block for neural networks.
This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers with optional shortcut connections.
Attributes:
Name | Type | Description |
---|---|---|
attn |
Attention
|
Multi-head attention module. |
ffn |
Sequential
|
Feed-forward neural network module. |
add |
bool
|
Flag indicating whether to add shortcut connections. |
Methods:
Name | Description |
---|---|
forward |
Performs a forward pass through the PSABlock, applying attention and feed-forward layers. |
Examples:
Create a PSABlock and perform a forward pass
>>> psablock = PSABlock(c=128, attn_ratio=0.5, num_heads=4, shortcut=True)
>>> input_tensor = torch.randn(1, 128, 32, 32)
>>> output_tensor = psablock(input_tensor)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c
|
int
|
Input and output channels. |
required |
attn_ratio
|
float
|
Attention ratio for key dimension. |
0.5
|
num_heads
|
int
|
Number of attention heads. |
4
|
shortcut
|
bool
|
Whether to use shortcut connections. |
True
|
Source code in ultralytics/nn/modules/block.py
forward
Execute a forward pass through PSABlock.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after attention and feed-forward processing. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.PSA
Bases: Module
PSA class for implementing Position-Sensitive Attention in neural networks.
This class encapsulates the functionality for applying position-sensitive attention and feed-forward networks to input tensors, enhancing feature extraction and processing capabilities.
Attributes:
Name | Type | Description |
---|---|---|
c |
int
|
Number of hidden channels after applying the initial convolution. |
cv1 |
Conv
|
1x1 convolution layer to reduce the number of input channels to 2*c. |
cv2 |
Conv
|
1x1 convolution layer to reduce the number of output channels to c. |
attn |
Attention
|
Attention module for position-sensitive attention. |
ffn |
Sequential
|
Feed-forward network for further processing. |
Methods:
Name | Description |
---|---|
forward |
Applies position-sensitive attention and feed-forward network to the input tensor. |
Examples:
Create a PSA module and apply it to an input tensor
>>> psa = PSA(c1=128, c2=128, e=0.5)
>>> input_tensor = torch.randn(1, 128, 64, 64)
>>> output_tensor = psa.forward(input_tensor)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
forward
Execute forward pass in PSA module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after attention and feed-forward processing. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C2PSA
Bases: Module
C2PSA module with attention mechanism for enhanced feature extraction and processing.
This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations.
Attributes:
Name | Type | Description |
---|---|---|
c |
int
|
Number of hidden channels. |
cv1 |
Conv
|
1x1 convolution layer to reduce the number of input channels to 2*c. |
cv2 |
Conv
|
1x1 convolution layer to reduce the number of output channels to c. |
m |
Sequential
|
Sequential container of PSABlock modules for attention and feed-forward operations. |
Methods:
Name | Description |
---|---|
forward |
Performs a forward pass through the C2PSA module, applying attention and feed-forward operations. |
Notes
This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules.
Examples:
>>> c2psa = C2PSA(c1=256, c2=256, n=3, e=0.5)
>>> input_tensor = torch.randn(1, 256, 64, 64)
>>> output_tensor = c2psa(input_tensor)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of PSABlock modules. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
forward
Process the input tensor through a series of PSA blocks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after processing. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.C2fPSA
Bases: C2f
C2fPSA module with enhanced feature extraction using PSA blocks.
This class extends the C2f module by incorporating PSA blocks for improved attention mechanisms and feature extraction.
Attributes:
Name | Type | Description |
---|---|---|
c |
int
|
Number of hidden channels. |
cv1 |
Conv
|
1x1 convolution layer to reduce the number of input channels to 2*c. |
cv2 |
Conv
|
1x1 convolution layer to reduce the number of output channels to c. |
m |
ModuleList
|
List of PSA blocks for feature extraction. |
Methods:
Name | Description |
---|---|
forward |
Performs a forward pass through the C2fPSA module. |
forward_split |
Performs a forward pass using split() instead of chunk(). |
Examples:
>>> import torch
>>> from ultralytics.models.common import C2fPSA
>>> model = C2fPSA(c1=64, c2=64, n=3, e=0.5)
>>> x = torch.randn(1, 64, 128, 128)
>>> output = model(x)
>>> print(output.shape)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
n
|
int
|
Number of PSABlock modules. |
1
|
e
|
float
|
Expansion ratio. |
0.5
|
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.SCDown
Bases: Module
SCDown module for downsampling with separable convolutions.
This module performs downsampling using a combination of pointwise and depthwise convolutions, which helps in efficiently reducing the spatial dimensions of the input tensor while maintaining the channel information.
Attributes:
Name | Type | Description |
---|---|---|
cv1 |
Conv
|
Pointwise convolution layer that reduces the number of channels. |
cv2 |
Conv
|
Depthwise convolution layer that performs spatial downsampling. |
Methods:
Name | Description |
---|---|
forward |
Applies the SCDown module to the input tensor. |
Examples:
>>> import torch
>>> from ultralytics import SCDown
>>> model = SCDown(c1=64, c2=128, k=3, s=2)
>>> x = torch.randn(1, 64, 128, 128)
>>> y = model(x)
>>> print(y.shape)
torch.Size([1, 128, 64, 64])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Input channels. |
required |
c2
|
int
|
Output channels. |
required |
k
|
int
|
Kernel size. |
required |
s
|
int
|
Stride. |
required |
Source code in ultralytics/nn/modules/block.py
forward
Apply convolution and downsampling to the input tensor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Downsampled output tensor. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.TorchVision
Bases: Module
TorchVision module to allow loading any torchvision model.
This class provides a way to load a model from the torchvision library, optionally load pre-trained weights, and customize the model by truncating or unwrapping layers.
Attributes:
Name | Type | Description |
---|---|---|
m |
Module
|
The loaded torchvision model, possibly truncated and unwrapped. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Name of the torchvision model to load. |
required |
weights
|
str
|
Pre-trained weights to load. Default is "DEFAULT". |
'DEFAULT'
|
unwrap
|
bool
|
If True, unwraps the model to a sequential containing all but the last |
True
|
truncate
|
int
|
Number of layers to truncate from the end if |
2
|
split
|
bool
|
Returns output from intermediate child modules as list. Default is False. |
False
|
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Name of the torchvision model to load. |
required |
weights
|
str
|
Pre-trained weights to load. |
'DEFAULT'
|
unwrap
|
bool
|
Whether to unwrap the model. |
True
|
truncate
|
int
|
Number of layers to truncate. |
2
|
split
|
bool
|
Whether to split the output. |
False
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor | List[Tensor]
|
Output tensor or list of tensors. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.AAttn
Bases: Module
Area-attention module for YOLO models, providing efficient attention mechanisms.
This module implements an area-based attention mechanism that processes input features in a spatially-aware manner, making it particularly effective for object detection tasks.
Attributes:
Name | Type | Description |
---|---|---|
area |
int
|
Number of areas the feature map is divided. |
num_heads |
int
|
Number of heads into which the attention mechanism is divided. |
head_dim |
int
|
Dimension of each attention head. |
qkv |
Conv
|
Convolution layer for computing query, key and value tensors. |
proj |
Conv
|
Projection convolution layer. |
pe |
Conv
|
Position encoding convolution layer. |
Methods:
Name | Description |
---|---|
forward |
Applies area-attention to input tensor. |
Examples:
>>> attn = AAttn(dim=256, num_heads=8, area=4)
>>> x = torch.randn(1, 256, 32, 32)
>>> output = attn(x)
>>> print(output.shape)
torch.Size([1, 256, 32, 32])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim
|
int
|
Number of hidden channels. |
required |
num_heads
|
int
|
Number of heads into which the attention mechanism is divided. |
required |
area
|
int
|
Number of areas the feature map is divided, default is 1. |
1
|
Source code in ultralytics/nn/modules/block.py
forward
Process the input tensor through the area-attention.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after area-attention. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.ABlock
Bases: Module
Area-attention block module for efficient feature extraction in YOLO models.
This module implements an area-attention mechanism combined with a feed-forward network for processing feature maps. It uses a novel area-based attention approach that is more efficient than traditional self-attention while maintaining effectiveness.
Attributes:
Name | Type | Description |
---|---|---|
attn |
AAttn
|
Area-attention module for processing spatial features. |
mlp |
Sequential
|
Multi-layer perceptron for feature transformation. |
Methods:
Name | Description |
---|---|
_init_weights |
Initializes module weights using truncated normal distribution. |
forward |
Applies area-attention and feed-forward processing to input tensor. |
Examples:
>>> block = ABlock(dim=256, num_heads=8, mlp_ratio=1.2, area=1)
>>> x = torch.randn(1, 256, 32, 32)
>>> output = block(x)
>>> print(output.shape)
torch.Size([1, 256, 32, 32])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim
|
int
|
Number of input channels. |
required |
num_heads
|
int
|
Number of heads into which the attention mechanism is divided. |
required |
mlp_ratio
|
float
|
Expansion ratio for MLP hidden dimension. |
1.2
|
area
|
int
|
Number of areas the feature map is divided. |
1
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through ABlock.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after area-attention and feed-forward processing. |
Source code in ultralytics/nn/modules/block.py
ultralytics.nn.modules.block.A2C2f
Bases: Module
Area-Attention C2f module for enhanced feature extraction with area-based attention mechanisms.
This module extends the C2f architecture by incorporating area-attention and ABlock layers for improved feature processing. It supports both area-attention and standard convolution modes.
Attributes:
Name | Type | Description |
---|---|---|
cv1 |
Conv
|
Initial 1x1 convolution layer that reduces input channels to hidden channels. |
cv2 |
Conv
|
Final 1x1 convolution layer that processes concatenated features. |
gamma |
Parameter | None
|
Learnable parameter for residual scaling when using area attention. |
m |
ModuleList
|
List of either ABlock or C3k modules for feature processing. |
Methods:
Name | Description |
---|---|
forward |
Processes input through area-attention or standard convolution pathway. |
Examples:
>>> m = A2C2f(512, 512, n=1, a2=True, area=1)
>>> x = torch.randn(1, 512, 32, 32)
>>> output = m(x)
>>> print(output.shape)
torch.Size([1, 512, 32, 32])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c1
|
int
|
Number of input channels. |
required |
c2
|
int
|
Number of output channels. |
required |
n
|
int
|
Number of ABlock or C3k modules to stack. |
1
|
a2
|
bool
|
Whether to use area attention blocks. If False, uses C3k blocks instead. |
True
|
area
|
int
|
Number of areas the feature map is divided. |
1
|
residual
|
bool
|
Whether to use residual connections with learnable gamma parameter. |
False
|
mlp_ratio
|
float
|
Expansion ratio for MLP hidden dimension. |
2.0
|
e
|
float
|
Channel expansion ratio for hidden channels. |
0.5
|
g
|
int
|
Number of groups for grouped convolutions. |
1
|
shortcut
|
bool
|
Whether to use shortcut connections in C3k blocks. |
True
|
Source code in ultralytics/nn/modules/block.py
forward
Forward pass through A2C2f layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Output tensor after processing. |