Reference for ultralytics/models/sam/modules/decoders.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/sam/modules/decoders.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!
ultralytics.models.sam.modules.decoders.MaskDecoder
MaskDecoder(*, transformer_dim: int, transformer: nn.Module, num_multimask_outputs: int = 3, activation: Type[nn.Module] = nn.GELU, iou_head_depth: int = 3, iou_head_hidden_dim: int = 256)
Bases: Module
Decoder module for generating masks and their associated quality scores, using a transformer architecture to predict masks given image and prompt embeddings.
Attributes:
Name | Type | Description |
---|---|---|
transformer_dim |
int
|
Channel dimension for the transformer module. |
transformer |
Module
|
The transformer module used for mask prediction. |
num_multimask_outputs |
int
|
Number of masks to predict for disambiguating masks. |
iou_token |
Embedding
|
Embedding for the IoU token. |
num_mask_tokens |
int
|
Number of mask tokens. |
mask_tokens |
Embedding
|
Embedding for the mask tokens. |
output_upscaling |
Sequential
|
Neural network sequence for upscaling the output. |
output_hypernetworks_mlps |
ModuleList
|
Hypernetwork MLPs for generating masks. |
iou_prediction_head |
Module
|
MLP for predicting mask quality. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformer_dim |
int
|
the channel dimension of the transformer module |
required |
transformer |
Module
|
the transformer used to predict masks |
required |
num_multimask_outputs |
int
|
the number of masks to predict when disambiguating masks |
3
|
activation |
Module
|
the type of activation to use when upscaling masks |
GELU
|
iou_head_depth |
int
|
the depth of the MLP used to predict mask quality |
3
|
iou_head_hidden_dim |
int
|
the hidden dimension of the MLP used to predict mask quality |
256
|
Source code in ultralytics/models/sam/modules/decoders.py
forward
forward(image_embeddings: torch.Tensor, image_pe: torch.Tensor, sparse_prompt_embeddings: torch.Tensor, dense_prompt_embeddings: torch.Tensor, multimask_output: bool) -> Tuple[torch.Tensor, torch.Tensor]
Predict masks given image and prompt embeddings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_embeddings |
Tensor
|
the embeddings from the image encoder |
required |
image_pe |
Tensor
|
positional encoding with the shape of image_embeddings |
required |
sparse_prompt_embeddings |
Tensor
|
the embeddings of the points and boxes |
required |
dense_prompt_embeddings |
Tensor
|
the embeddings of the mask inputs |
required |
multimask_output |
bool
|
Whether to return multiple masks or a single mask. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: batched predicted masks |
Tensor
|
torch.Tensor: batched predictions of mask quality |
Source code in ultralytics/models/sam/modules/decoders.py
predict_masks
predict_masks(image_embeddings: torch.Tensor, image_pe: torch.Tensor, sparse_prompt_embeddings: torch.Tensor, dense_prompt_embeddings: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]
Predicts masks.
See 'forward' for more details.
Source code in ultralytics/models/sam/modules/decoders.py
ultralytics.models.sam.modules.decoders.MLP
MLP(input_dim: int, hidden_dim: int, output_dim: int, num_layers: int, sigmoid_output: bool = False)
Bases: Module
MLP (Multi-Layer Perceptron) model lightly adapted from https://github.com/facebookresearch/MaskFormer/blob/main/mask_former/modeling/transformer/transformer_predictor.py
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_dim |
int
|
The dimensionality of the input features. |
required |
hidden_dim |
int
|
The dimensionality of the hidden layers. |
required |
output_dim |
int
|
The dimensionality of the output layer. |
required |
num_layers |
int
|
The number of hidden layers. |
required |
sigmoid_output |
bool
|
Apply a sigmoid activation to the output layer. Defaults to False. |
False
|
Source code in ultralytics/models/sam/modules/decoders.py
forward
Executes feedforward within the neural network module and applies activation.