跳至内容

参考资料 ultralytics/models/sam/modules/sam.py

备注

该文件可在https://github.com/ultralytics/ultralytics/blob/main/ ultralytics/models/ sam/modules/ sam.py 中找到。如果您发现问题,请通过提交 Pull Request🛠️ 帮助修复。谢谢🙏!



ultralytics.models.sam.modules.sam.Sam

垒球 Module

Sam (Segment Anything Model)是为物体分割任务而设计的。它使用图像编码器生成图像 嵌入和提示编码器对各种类型的输入提示进行编码。然后,掩码 解码器来预测对象掩码。

属性

名称 类型 说明
mask_threshold float

掩码预测的阈值。

image_format str

输入图像的格式,默认为 "RGB"。

image_encoder ImageEncoderViT

用于将图像编码为嵌入式图像的骨干。

prompt_encoder PromptEncoder

对各类输入提示进行编码。

mask_decoder MaskDecoder

根据图像和提示嵌入预测对象掩码。

pixel_mean List[float]

用于图像归一化的平均像素值。

pixel_std List[float]

图像正常化的标准偏差值。

源代码 ultralytics/models/sam/modules/sam.py
class Sam(nn.Module):
    """
    Sam (Segment Anything Model) is designed for object segmentation tasks. It uses image encoders to generate image
    embeddings, and prompt encoders to encode various types of input prompts. These embeddings are then used by the mask
    decoder to predict object masks.

    Attributes:
        mask_threshold (float): Threshold value for mask prediction.
        image_format (str): Format of the input image, default is 'RGB'.
        image_encoder (ImageEncoderViT): The backbone used to encode the image into embeddings.
        prompt_encoder (PromptEncoder): Encodes various types of input prompts.
        mask_decoder (MaskDecoder): Predicts object masks from the image and prompt embeddings.
        pixel_mean (List[float]): Mean pixel values for image normalization.
        pixel_std (List[float]): Standard deviation values for image normalization.
    """

    mask_threshold: float = 0.0
    image_format: str = "RGB"

    def __init__(
        self,
        image_encoder: ImageEncoderViT,
        prompt_encoder: PromptEncoder,
        mask_decoder: MaskDecoder,
        pixel_mean: List[float] = (123.675, 116.28, 103.53),
        pixel_std: List[float] = (58.395, 57.12, 57.375),
    ) -> None:
        """
        Initialize the Sam class to predict object masks from an image and input prompts.

        Note:
            All forward() operations moved to SAMPredictor.

        Args:
            image_encoder (ImageEncoderViT): The backbone used to encode the image into image embeddings.
            prompt_encoder (PromptEncoder): Encodes various types of input prompts.
            mask_decoder (MaskDecoder): Predicts masks from the image embeddings and encoded prompts.
            pixel_mean (List[float], optional): Mean values for normalizing pixels in the input image. Defaults to
                (123.675, 116.28, 103.53).
            pixel_std (List[float], optional): Std values for normalizing pixels in the input image. Defaults to
                (58.395, 57.12, 57.375).
        """
        super().__init__()
        self.image_encoder = image_encoder
        self.prompt_encoder = prompt_encoder
        self.mask_decoder = mask_decoder
        self.register_buffer("pixel_mean", torch.Tensor(pixel_mean).view(-1, 1, 1), False)
        self.register_buffer("pixel_std", torch.Tensor(pixel_std).view(-1, 1, 1), False)

__init__(image_encoder, prompt_encoder, mask_decoder, pixel_mean=(123.675, 116.28, 103.53), pixel_std=(58.395, 57.12, 57.375))

初始化Sam 类,以便根据图像和输入提示预测对象掩码。

备注

所有 forward() 操作移至 SAMPredictor。

参数

名称 类型 说明 默认值
image_encoder ImageEncoderViT

用于将图像编码为图像嵌入的骨干。

所需
prompt_encoder PromptEncoder

对各类输入提示进行编码。

所需
mask_decoder MaskDecoder

根据图像嵌入和编码提示预测掩码。

所需
pixel_mean List[float]

对输入图像中的像素进行归一化处理的平均值。默认值为 (123.675, 116.28, 103.53).

(123.675, 116.28, 103.53)
pixel_std List[float]

将输入图像中的像素标准化的标准值。默认值为 (58.395, 57.12, 57.375).

(58.395, 57.12, 57.375)
源代码 ultralytics/models/sam/modules/sam.py
def __init__(
    self,
    image_encoder: ImageEncoderViT,
    prompt_encoder: PromptEncoder,
    mask_decoder: MaskDecoder,
    pixel_mean: List[float] = (123.675, 116.28, 103.53),
    pixel_std: List[float] = (58.395, 57.12, 57.375),
) -> None:
    """
    Initialize the Sam class to predict object masks from an image and input prompts.

    Note:
        All forward() operations moved to SAMPredictor.

    Args:
        image_encoder (ImageEncoderViT): The backbone used to encode the image into image embeddings.
        prompt_encoder (PromptEncoder): Encodes various types of input prompts.
        mask_decoder (MaskDecoder): Predicts masks from the image embeddings and encoded prompts.
        pixel_mean (List[float], optional): Mean values for normalizing pixels in the input image. Defaults to
            (123.675, 116.28, 103.53).
        pixel_std (List[float], optional): Std values for normalizing pixels in the input image. Defaults to
            (58.395, 57.12, 57.375).
    """
    super().__init__()
    self.image_encoder = image_encoder
    self.prompt_encoder = prompt_encoder
    self.mask_decoder = mask_decoder
    self.register_buffer("pixel_mean", torch.Tensor(pixel_mean).view(-1, 1, 1), False)
    self.register_buffer("pixel_std", torch.Tensor(pixel_std).view(-1, 1, 1), False)





创建于 2023-11-12,更新于 2023-11-25
作者:glenn-jocher(3)