Skip to content

Reference for ultralytics/models/sam/modules/sam.py

Note

This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/sam/modules/sam.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!



ultralytics.models.sam.modules.sam.Sam

Bases: Module

Sam (Segment Anything Model) is designed for object segmentation tasks. It uses image encoders to generate image embeddings, and prompt encoders to encode various types of input prompts. These embeddings are then used by the mask decoder to predict object masks.

Attributes:

Name Type Description
mask_threshold float

Threshold value for mask prediction.

image_format str

Format of the input image, default is 'RGB'.

image_encoder ImageEncoderViT

The backbone used to encode the image into embeddings.

prompt_encoder PromptEncoder

Encodes various types of input prompts.

mask_decoder MaskDecoder

Predicts object masks from the image and prompt embeddings.

pixel_mean List[float]

Mean pixel values for image normalization.

pixel_std List[float]

Standard deviation values for image normalization.

Source code in ultralytics/models/sam/modules/sam.py
class Sam(nn.Module):
    """
    Sam (Segment Anything Model) is designed for object segmentation tasks. It uses image encoders to generate image
    embeddings, and prompt encoders to encode various types of input prompts. These embeddings are then used by the mask
    decoder to predict object masks.

    Attributes:
        mask_threshold (float): Threshold value for mask prediction.
        image_format (str): Format of the input image, default is 'RGB'.
        image_encoder (ImageEncoderViT): The backbone used to encode the image into embeddings.
        prompt_encoder (PromptEncoder): Encodes various types of input prompts.
        mask_decoder (MaskDecoder): Predicts object masks from the image and prompt embeddings.
        pixel_mean (List[float]): Mean pixel values for image normalization.
        pixel_std (List[float]): Standard deviation values for image normalization.
    """

    mask_threshold: float = 0.0
    image_format: str = "RGB"

    def __init__(
        self,
        image_encoder: ImageEncoderViT,
        prompt_encoder: PromptEncoder,
        mask_decoder: MaskDecoder,
        pixel_mean: List[float] = (123.675, 116.28, 103.53),
        pixel_std: List[float] = (58.395, 57.12, 57.375),
    ) -> None:
        """
        Initialize the Sam class to predict object masks from an image and input prompts.

        Note:
            All forward() operations moved to SAMPredictor.

        Args:
            image_encoder (ImageEncoderViT): The backbone used to encode the image into image embeddings.
            prompt_encoder (PromptEncoder): Encodes various types of input prompts.
            mask_decoder (MaskDecoder): Predicts masks from the image embeddings and encoded prompts.
            pixel_mean (List[float], optional): Mean values for normalizing pixels in the input image. Defaults to
                (123.675, 116.28, 103.53).
            pixel_std (List[float], optional): Std values for normalizing pixels in the input image. Defaults to
                (58.395, 57.12, 57.375).
        """
        super().__init__()
        self.image_encoder = image_encoder
        self.prompt_encoder = prompt_encoder
        self.mask_decoder = mask_decoder
        self.register_buffer("pixel_mean", torch.Tensor(pixel_mean).view(-1, 1, 1), False)
        self.register_buffer("pixel_std", torch.Tensor(pixel_std).view(-1, 1, 1), False)

__init__(image_encoder, prompt_encoder, mask_decoder, pixel_mean=(123.675, 116.28, 103.53), pixel_std=(58.395, 57.12, 57.375))

Initialize the Sam class to predict object masks from an image and input prompts.

Note

All forward() operations moved to SAMPredictor.

Parameters:

Name Type Description Default
image_encoder ImageEncoderViT

The backbone used to encode the image into image embeddings.

required
prompt_encoder PromptEncoder

Encodes various types of input prompts.

required
mask_decoder MaskDecoder

Predicts masks from the image embeddings and encoded prompts.

required
pixel_mean List[float]

Mean values for normalizing pixels in the input image. Defaults to (123.675, 116.28, 103.53).

(123.675, 116.28, 103.53)
pixel_std List[float]

Std values for normalizing pixels in the input image. Defaults to (58.395, 57.12, 57.375).

(58.395, 57.12, 57.375)
Source code in ultralytics/models/sam/modules/sam.py
def __init__(
    self,
    image_encoder: ImageEncoderViT,
    prompt_encoder: PromptEncoder,
    mask_decoder: MaskDecoder,
    pixel_mean: List[float] = (123.675, 116.28, 103.53),
    pixel_std: List[float] = (58.395, 57.12, 57.375),
) -> None:
    """
    Initialize the Sam class to predict object masks from an image and input prompts.

    Note:
        All forward() operations moved to SAMPredictor.

    Args:
        image_encoder (ImageEncoderViT): The backbone used to encode the image into image embeddings.
        prompt_encoder (PromptEncoder): Encodes various types of input prompts.
        mask_decoder (MaskDecoder): Predicts masks from the image embeddings and encoded prompts.
        pixel_mean (List[float], optional): Mean values for normalizing pixels in the input image. Defaults to
            (123.675, 116.28, 103.53).
        pixel_std (List[float], optional): Std values for normalizing pixels in the input image. Defaults to
            (58.395, 57.12, 57.375).
    """
    super().__init__()
    self.image_encoder = image_encoder
    self.prompt_encoder = prompt_encoder
    self.mask_decoder = mask_decoder
    self.register_buffer("pixel_mean", torch.Tensor(pixel_mean).view(-1, 1, 1), False)
    self.register_buffer("pixel_std", torch.Tensor(pixel_std).view(-1, 1, 1), False)





Created 2023-11-12, Updated 2023-11-25
Authors: glenn-jocher (3)