Reference for `ultralytics/models/sam/modules/sam.py`

Note

This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/sam/modules/sam.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!

ultralytics.models.sam.modules.sam.Sam

Sam(image_encoder: ImageEncoderViT, prompt_encoder: PromptEncoder, mask_decoder: MaskDecoder, pixel_mean: List[float] = (123.675, 116.28, 103.53), pixel_std: List[float] = (58.395, 57.12, 57.375))

Bases: Module

Sam (Segment Anything Model) is designed for object segmentation tasks. It uses image encoders to generate image embeddings, and prompt encoders to encode various types of input prompts. These embeddings are then used by the mask decoder to predict object masks.

Attributes:

Name	Type	Description
`mask_threshold`	`float`	Threshold value for mask prediction.
`image_format`	`str`	Format of the input image, default is 'RGB'.
`image_encoder`	`ImageEncoderViT`	The backbone used to encode the image into embeddings.
`prompt_encoder`	`PromptEncoder`	Encodes various types of input prompts.
`mask_decoder`	`MaskDecoder`	Predicts object masks from the image and prompt embeddings.
`pixel_mean`	`List[float]`	Mean pixel values for image normalization.
`pixel_std`	`List[float]`	Standard deviation values for image normalization.

Note

All forward() operations moved to SAMPredictor.

Parameters:

Name	Type	Description	Default
`image_encoder`	`ImageEncoderViT`	The backbone used to encode the image into image embeddings.	required
`prompt_encoder`	`PromptEncoder`	Encodes various types of input prompts.	required
`mask_decoder`	`MaskDecoder`	Predicts masks from the image embeddings and encoded prompts.	required
`pixel_mean`	`List[float]`	Mean values for normalizing pixels in the input image. Defaults to (123.675, 116.28, 103.53).	`(123.675, 116.28, 103.53)`
`pixel_std`	`List[float]`	Std values for normalizing pixels in the input image. Defaults to (58.395, 57.12, 57.375).	`(58.395, 57.12, 57.375)`

Source code in ultralytics/models/sam/modules/sam.py

def __init__(
    self,
    image_encoder: ImageEncoderViT,
    prompt_encoder: PromptEncoder,
    mask_decoder: MaskDecoder,
    pixel_mean: List[float] = (123.675, 116.28, 103.53),
    pixel_std: List[float] = (58.395, 57.12, 57.375),
) -> None:
    """
    Initialize the Sam class to predict object masks from an image and input prompts.

    Note:
        All forward() operations moved to SAMPredictor.

    Args:
        image_encoder (ImageEncoderViT): The backbone used to encode the image into image embeddings.
        prompt_encoder (PromptEncoder): Encodes various types of input prompts.
        mask_decoder (MaskDecoder): Predicts masks from the image embeddings and encoded prompts.
        pixel_mean (List[float], optional): Mean values for normalizing pixels in the input image. Defaults to
            (123.675, 116.28, 103.53).
        pixel_std (List[float], optional): Std values for normalizing pixels in the input image. Defaults to
            (58.395, 57.12, 57.375).
    """
    super().__init__()
    self.image_encoder = image_encoder
    self.prompt_encoder = prompt_encoder
    self.mask_decoder = mask_decoder
    self.register_buffer("pixel_mean", torch.Tensor(pixel_mean).view(-1, 1, 1), False)
    self.register_buffer("pixel_std", torch.Tensor(pixel_std).view(-1, 1, 1), False)

Created 2023-11-12, Updated 2024-07-21
Authors: glenn-jocher (6), Burhan-Q (1)

Reference for ultralytics/models/sam/modules/sam.py

ultralytics.models.sam.modules.sam.Sam

Reference for `ultralytics/models/sam/modules/sam.py`