Skip to content

Reference for ultralytics/models/sam/modules/sam.py

Note

Full source code for this file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/sam/modules/sam.py. Help us fix any issues you see by submitting a Pull Request 🛠️. Thank you 🙏!


ultralytics.models.sam.modules.sam.Sam

Bases: Module

Source code in ultralytics/models/sam/modules/sam.py
class Sam(nn.Module):
    mask_threshold: float = 0.0
    image_format: str = 'RGB'

    def __init__(
        self,
        image_encoder: ImageEncoderViT,
        prompt_encoder: PromptEncoder,
        mask_decoder: MaskDecoder,
        pixel_mean: List[float] = (123.675, 116.28, 103.53),
        pixel_std: List[float] = (58.395, 57.12, 57.375)
    ) -> None:
        """
        SAM predicts object masks from an image and input prompts.

        Note:
            All forward() operations moved to SAMPredictor.

        Args:
          image_encoder (ImageEncoderViT): The backbone used to encode the image into image embeddings that allow for
            efficient mask prediction.
          prompt_encoder (PromptEncoder): Encodes various types of input prompts.
          mask_decoder (MaskDecoder): Predicts masks from the image embeddings and encoded prompts.
          pixel_mean (list(float)): Mean values for normalizing pixels in the input image.
          pixel_std (list(float)): Std values for normalizing pixels in the input image.
        """
        super().__init__()
        self.image_encoder = image_encoder
        self.prompt_encoder = prompt_encoder
        self.mask_decoder = mask_decoder
        self.register_buffer('pixel_mean', torch.Tensor(pixel_mean).view(-1, 1, 1), False)
        self.register_buffer('pixel_std', torch.Tensor(pixel_std).view(-1, 1, 1), False)

__init__(image_encoder, prompt_encoder, mask_decoder, pixel_mean=(123.675, 116.28, 103.53), pixel_std=(58.395, 57.12, 57.375))

SAM predicts object masks from an image and input prompts.

Note

All forward() operations moved to SAMPredictor.

Parameters:

Name Type Description Default
image_encoder ImageEncoderViT

The backbone used to encode the image into image embeddings that allow for efficient mask prediction.

required
prompt_encoder PromptEncoder

Encodes various types of input prompts.

required
mask_decoder MaskDecoder

Predicts masks from the image embeddings and encoded prompts.

required
pixel_mean list(float

Mean values for normalizing pixels in the input image.

(123.675, 116.28, 103.53)
pixel_std list(float

Std values for normalizing pixels in the input image.

(58.395, 57.12, 57.375)
Source code in ultralytics/models/sam/modules/sam.py
def __init__(
    self,
    image_encoder: ImageEncoderViT,
    prompt_encoder: PromptEncoder,
    mask_decoder: MaskDecoder,
    pixel_mean: List[float] = (123.675, 116.28, 103.53),
    pixel_std: List[float] = (58.395, 57.12, 57.375)
) -> None:
    """
    SAM predicts object masks from an image and input prompts.

    Note:
        All forward() operations moved to SAMPredictor.

    Args:
      image_encoder (ImageEncoderViT): The backbone used to encode the image into image embeddings that allow for
        efficient mask prediction.
      prompt_encoder (PromptEncoder): Encodes various types of input prompts.
      mask_decoder (MaskDecoder): Predicts masks from the image embeddings and encoded prompts.
      pixel_mean (list(float)): Mean values for normalizing pixels in the input image.
      pixel_std (list(float)): Std values for normalizing pixels in the input image.
    """
    super().__init__()
    self.image_encoder = image_encoder
    self.prompt_encoder = prompt_encoder
    self.mask_decoder = mask_decoder
    self.register_buffer('pixel_mean', torch.Tensor(pixel_mean).view(-1, 1, 1), False)
    self.register_buffer('pixel_std', torch.Tensor(pixel_std).view(-1, 1, 1), False)




Created 2023-07-16, Updated 2023-08-07
Authors: glenn-jocher (5), Laughing-q (1)