Reference for ultralytics/models/sam/modules/tiny_encoder.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/sam/modules/tiny_encoder.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!
ultralytics.models.sam.modules.tiny_encoder.Conv2d_BN
Bases: Sequential
A sequential container that performs 2D convolution followed by batch normalization.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.PatchEmbed
Bases: Module
Embeds images into patches and projects them into a specified embedding dimension.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.MBConv
Bases: Module
Mobile Inverted Bottleneck Conv (MBConv) layer, part of the EfficientNet architecture.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
forward
Implements the forward pass for the model architecture.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.PatchMerging
Bases: Module
Merges neighboring patches in the feature map and projects to a new dimension.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
forward
Applies forward pass on the input utilizing convolution and activation layers, and returns the result.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.ConvLayer
ConvLayer(dim, input_resolution, depth, activation, drop_path=0.0, downsample=None, use_checkpoint=False, out_dim=None, conv_expand_ratio=4.0)
Bases: Module
Convolutional Layer featuring multiple MobileNetV3-style inverted bottleneck convolutions (MBConv).
Optionally applies downsample operations to the output, and provides support for gradient checkpointing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim |
int
|
The dimensionality of the input and output. |
required |
input_resolution |
Tuple[int, int]
|
The resolution of the input image. |
required |
depth |
int
|
The number of MBConv layers in the block. |
required |
activation |
Callable
|
Activation function applied after each convolution. |
required |
drop_path |
Union[float, List[float]]
|
Drop path rate. Single float or a list of floats for each MBConv. |
0.0
|
downsample |
Optional[Callable]
|
Function for downsampling the output. None to skip downsampling. |
None
|
use_checkpoint |
bool
|
Whether to use gradient checkpointing to save memory. |
False
|
out_dim |
Optional[int]
|
The dimensionality of the output. None means it will be the same as |
None
|
conv_expand_ratio |
float
|
Expansion ratio for the MBConv layers. |
4.0
|
Source code in ultralytics/models/sam/modules/tiny_encoder.py
forward
Processes the input through a series of convolutional layers and returns the activated output.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.Mlp
Bases: Module
Multi-layer Perceptron (MLP) for transformer architectures.
This layer takes an input with in_features, applies layer normalization and two fully-connected layers.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
forward
Applies operations on input x and returns modified x, runs downsample if not None.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.Attention
Bases: Module
Multi-head attention module with support for spatial awareness, applying attention biases based on spatial resolution. Implements trainable attention biases for each unique offset between spatial positions in the resolution grid.
Attributes:
Name | Type | Description |
---|---|---|
ab |
Tensor
|
Cached attention biases for inference, deleted during training. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim |
int
|
The dimensionality of the input and output. |
required |
key_dim |
int
|
The dimensionality of the keys and queries. |
required |
num_heads |
int
|
Number of attention heads. Default is 8. |
8
|
attn_ratio |
float
|
Attention ratio, affecting the dimensions of the value vectors. Default is 4. |
4
|
resolution |
Tuple[int, int]
|
Spatial resolution of the input feature map. Default is (14, 14). |
(14, 14)
|
Raises:
Type | Description |
---|---|
AssertionError
|
If |
Source code in ultralytics/models/sam/modules/tiny_encoder.py
forward
Performs forward pass over the input tensor 'x' by applying normalization and querying keys/values.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
train
Sets the module in training mode and handles attribute 'ab' based on the mode.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.TinyViTBlock
TinyViTBlock(dim, input_resolution, num_heads, window_size=7, mlp_ratio=4.0, drop=0.0, drop_path=0.0, local_conv_size=3, activation=nn.GELU)
Bases: Module
TinyViT Block that applies self-attention and a local convolution to the input.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim |
int
|
The dimensionality of the input and output. |
required |
input_resolution |
Tuple[int, int]
|
Spatial resolution of the input feature map. |
required |
num_heads |
int
|
Number of attention heads. |
required |
window_size |
int
|
Window size for attention. Default is 7. |
7
|
mlp_ratio |
float
|
Ratio of mlp hidden dim to embedding dim. Default is 4. |
4.0
|
drop |
float
|
Dropout rate. Default is 0. |
0.0
|
drop_path |
float
|
Stochastic depth rate. Default is 0. |
0.0
|
local_conv_size |
int
|
The kernel size of the local convolution. Default is 3. |
3
|
activation |
nn
|
Activation function for MLP. Default is nn.GELU. |
GELU
|
Raises:
Type | Description |
---|---|
AssertionError
|
If |
AssertionError
|
If |
Source code in ultralytics/models/sam/modules/tiny_encoder.py
extra_repr
Returns a formatted string representing the TinyViTBlock's parameters: dimension, input resolution, number of attentions heads, window size, and MLP ratio.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
forward
Applies attention-based transformation or padding to input 'x' before passing it through a local convolution.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.BasicLayer
BasicLayer(dim, input_resolution, depth, num_heads, window_size, mlp_ratio=4.0, drop=0.0, drop_path=0.0, downsample=None, use_checkpoint=False, local_conv_size=3, activation=nn.GELU, out_dim=None)
Bases: Module
A basic TinyViT layer for one stage in a TinyViT architecture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim |
int
|
The dimensionality of the input and output. |
required |
input_resolution |
Tuple[int, int]
|
Spatial resolution of the input feature map. |
required |
depth |
int
|
Number of TinyViT blocks. |
required |
num_heads |
int
|
Number of attention heads. |
required |
window_size |
int
|
Local window size. |
required |
mlp_ratio |
float
|
Ratio of mlp hidden dim to embedding dim. Default is 4. |
4.0
|
drop |
float
|
Dropout rate. Default is 0. |
0.0
|
drop_path |
float | tuple[float]
|
Stochastic depth rate. Default is 0. |
0.0
|
downsample |
Module | None
|
Downsample layer at the end of the layer. Default is None. |
None
|
use_checkpoint |
bool
|
Whether to use checkpointing to save memory. Default is False. |
False
|
local_conv_size |
int
|
Kernel size of the local convolution. Default is 3. |
3
|
activation |
nn
|
Activation function for MLP. Default is nn.GELU. |
GELU
|
out_dim |
int | None
|
The output dimension of the layer. Default is None. |
None
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
Source code in ultralytics/models/sam/modules/tiny_encoder.py
extra_repr
Returns a string representation of the extra_repr function with the layer's parameters.
forward
Performs forward propagation on the input tensor and returns a normalized tensor.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.LayerNorm2d
Bases: Module
A PyTorch implementation of Layer Normalization in 2D.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
forward
Perform a forward pass, normalizing the input tensor.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
ultralytics.models.sam.modules.tiny_encoder.TinyViT
TinyViT(img_size=224, in_chans=3, num_classes=1000, embed_dims=(96, 192, 384, 768), depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), window_sizes=(7, 7, 14, 7), mlp_ratio=4.0, drop_rate=0.0, drop_path_rate=0.1, use_checkpoint=False, mbconv_expand_ratio=4.0, local_conv_size=3, layer_lr_decay=1.0)
Bases: Module
The TinyViT architecture for vision tasks.
Attributes:
Name | Type | Description |
---|---|---|
img_size |
int
|
Input image size. |
in_chans |
int
|
Number of input channels. |
num_classes |
int
|
Number of classification classes. |
embed_dims |
List[int]
|
List of embedding dimensions for each layer. |
depths |
List[int]
|
List of depths for each layer. |
num_heads |
List[int]
|
List of number of attention heads for each layer. |
window_sizes |
List[int]
|
List of window sizes for each layer. |
mlp_ratio |
float
|
Ratio of MLP hidden dimension to embedding dimension. |
drop_rate |
float
|
Dropout rate for drop layers. |
drop_path_rate |
float
|
Drop path rate for stochastic depth. |
use_checkpoint |
bool
|
Use checkpointing for efficient memory usage. |
mbconv_expand_ratio |
float
|
Expansion ratio for MBConv layer. |
local_conv_size |
int
|
Local convolution kernel size. |
layer_lr_decay |
float
|
Layer-wise learning rate decay. |
Note
This implementation is generalized to accept a list of depths, attention heads, embedding dimensions and window sizes, which allows you to create a "stack" of TinyViT models of varying configurations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
img_size |
int
|
The input image size. Defaults to 224. |
224
|
in_chans |
int
|
Number of input channels. Defaults to 3. |
3
|
num_classes |
int
|
Number of classification classes. Defaults to 1000. |
1000
|
embed_dims |
List[int]
|
List of embedding dimensions per layer. Defaults to [96, 192, 384, 768]. |
(96, 192, 384, 768)
|
depths |
List[int]
|
List of depths for each layer. Defaults to [2, 2, 6, 2]. |
(2, 2, 6, 2)
|
num_heads |
List[int]
|
List of number of attention heads per layer. Defaults to [3, 6, 12, 24]. |
(3, 6, 12, 24)
|
window_sizes |
List[int]
|
List of window sizes for each layer. Defaults to [7, 7, 14, 7]. |
(7, 7, 14, 7)
|
mlp_ratio |
float
|
Ratio of MLP hidden dimension to embedding dimension. Defaults to 4. |
4.0
|
drop_rate |
float
|
Dropout rate. Defaults to 0. |
0.0
|
drop_path_rate |
float
|
Drop path rate for stochastic depth. Defaults to 0.1. |
0.1
|
use_checkpoint |
bool
|
Whether to use checkpointing for efficient memory usage. Defaults to False. |
False
|
mbconv_expand_ratio |
float
|
Expansion ratio for MBConv layer. Defaults to 4.0. |
4.0
|
local_conv_size |
int
|
Local convolution kernel size. Defaults to 3. |
3
|
layer_lr_decay |
float
|
Layer-wise learning rate decay. Defaults to 1.0. |
1.0
|
Source code in ultralytics/models/sam/modules/tiny_encoder.py
562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 |
|
forward
forward_features
Runs the input through the model layers and returns the transformed output.
Source code in ultralytics/models/sam/modules/tiny_encoder.py
no_weight_decay_keywords
Returns a dictionary of parameter names where weight decay should not be applied.
set_layer_lr_decay
Sets the learning rate decay for each layer in the TinyViT model.