Reference for `ultralytics/models/sam/predict.py`

Improvements

This page is sourced from https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/sam/predict.py. Have an improvement or example to add? Open a Pull Request — thank you! 🙏

Summary

ClassesMethods

Predictor
SAM2Predictor
SAM2VideoPredictor
SAM2DynamicInteractivePredictor
SAM3Predictor
SAM3SemanticPredictor
SAM3VideoPredictor
SAM3VideoSemanticPredictor

Predictor.preprocess
Predictor.pre_transform
Predictor.inference
Predictor.prompt_inference
Predictor._inference_features
Predictor._prepare_prompts
Predictor.generate
Predictor.setup_model
Predictor.get_model
Predictor.postprocess
Predictor.set_image
Predictor.setup_source
Predictor.get_im_features
Predictor.set_prompts
Predictor.reset_image
Predictor.remove_small_regions
Predictor.inference_features
SAM2Predictor.get_model
SAM2Predictor._prepare_prompts
SAM2Predictor.setup_source
SAM2Predictor.get_im_features
SAM2Predictor._inference_features
SAM2VideoPredictor.get_model
SAM2VideoPredictor.inference
SAM2VideoPredictor.postprocess
SAM2VideoPredictor.add_new_prompts
SAM2VideoPredictor.propagate_in_video_preflight
SAM2VideoPredictor.init_state
SAM2VideoPredictor._init_state
SAM2VideoPredictor.get_im_features
SAM2VideoPredictor._obj_id_to_idx
SAM2VideoPredictor._run_single_frame_inference
SAM2VideoPredictor._get_maskmem_pos_enc
SAM2VideoPredictor._consolidate_temp_output_across_obj
SAM2VideoPredictor._get_empty_mask_ptr
SAM2VideoPredictor._run_memory_encoder
SAM2VideoPredictor._add_output_per_object
SAM2VideoPredictor._clear_non_cond_mem_around_input
SAM2VideoPredictor.remove_object
SAM2VideoPredictor.clear_all_points_in_frame
SAM2VideoPredictor.clear_all_points_in_video
SAM2VideoPredictor._reset_tracking_results
SAM2VideoPredictor._prune_non_cond_memory
SAM2DynamicInteractivePredictor.inference
SAM2DynamicInteractivePredictor.get_im_features
SAM2DynamicInteractivePredictor.update_memory
SAM2DynamicInteractivePredictor._prepare_memory_conditioned_features
SAM2DynamicInteractivePredictor.get_maskmem_enc
SAM2DynamicInteractivePredictor._obj_id_to_idx
SAM2DynamicInteractivePredictor.track_step
SAM3Predictor.setup_model
SAM3Predictor.get_model
SAM3SemanticPredictor.get_model
SAM3SemanticPredictor.get_im_features
SAM3SemanticPredictor.pre_transform
SAM3SemanticPredictor._prepare_geometric_prompts
SAM3SemanticPredictor._inference_features
SAM3SemanticPredictor.postprocess
SAM3SemanticPredictor.inference
SAM3SemanticPredictor.inference_features
SAM3SemanticPredictor.reset_prompts
SAM3SemanticPredictor._get_dummy_prompt
SAM3VideoPredictor.propagate_in_video
SAM3VideoSemanticPredictor.setup_model
SAM3VideoSemanticPredictor.setup_source
SAM3VideoSemanticPredictor.init_state
SAM3VideoSemanticPredictor.inference
SAM3VideoSemanticPredictor.postprocess
SAM3VideoSemanticPredictor._run_single_frame_inference
SAM3VideoSemanticPredictor.add_prompt
SAM3VideoSemanticPredictor._apply_object_wise_non_overlapping_constraints
SAM3VideoSemanticPredictor._det_track_one_frame
SAM3VideoSemanticPredictor._suppress_detections_close_to_boundary
SAM3VideoSemanticPredictor.run_backbone_and_detection
SAM3VideoSemanticPredictor._extract_detection_outputs
SAM3VideoSemanticPredictor._cache_backbone_features
SAM3VideoSemanticPredictor.run_tracker_propagation
SAM3VideoSemanticPredictor._recondition_masklets
SAM3VideoSemanticPredictor.run_tracker_update_planning_phase
SAM3VideoSemanticPredictor._suppress_overlapping_based_on_recent_occlusion
SAM3VideoSemanticPredictor.run_tracker_update_execution_phase
SAM3VideoSemanticPredictor.build_outputs
SAM3VideoSemanticPredictor._propogate_tracker_one_frame_local_gpu
SAM3VideoSemanticPredictor._associate_det_trk
SAM3VideoSemanticPredictor._process_hotstart
SAM3VideoSemanticPredictor._tracker_update_memories
SAM3VideoSemanticPredictor._tracker_add_new_objects
SAM3VideoSemanticPredictor._tracker_remove_objects
SAM3VideoSemanticPredictor._initialize_metadata
SAM3VideoSemanticPredictor.update_masklet_confirmation_status
SAM3VideoSemanticPredictor._drop_new_det_with_obj_limit

class `ultralytics.models.sam.predict.Predictor`

Predictor(self, cfg = DEFAULT_CFG, overrides = None, _callbacks: dict | None = None)

Bases: BasePredictor

Predictor class for SAM, enabling real-time image segmentation with promptable capabilities.

This class extends BasePredictor and implements the Segment Anything Model (SAM) for advanced image segmentation tasks. It supports various input prompts like points, bounding boxes, and masks for fine-grained control over segmentation results.

Sets up the Predictor object for SAM (Segment Anything Model) and applies any configuration overrides or callbacks provided. Initializes task-specific settings for SAM, such as retina_masks being set to True for optimal results.

Args

Name	Type	Description	Default
`cfg`	`dict`	Configuration dictionary containing default settings.	`DEFAULT_CFG`
`overrides`	`dict \| None`	Dictionary of values to override default configuration.	`None`
`_callbacks`	`dict \| None`	Dictionary of callback functions to customize behavior.	`None`

Attributes

Name	Type	Description
`args`	`SimpleNamespace`	Configuration arguments for the predictor.
`model`	`torch.nn.Module`	The loaded SAM model.
`device`	`torch.device`	The device (CPU or GPU) on which the model is loaded.
`im`	`torch.Tensor`	The preprocessed input image.
`features`	`torch.Tensor`	Extracted image features.
`prompts`	`dict[str, Any]`	Dictionary to store various types of prompts (e.g., bboxes, points, masks).
`segment_all`	`bool`	Flag to indicate if full image segmentation should be performed.
`mean`	`torch.Tensor`	Mean values for image normalization.
`std`	`torch.Tensor`	Standard deviation values for image normalization.

Methods

Name	Description
`_inference_features`	Perform inference on image features using the SAM model.
`_prepare_prompts`	Prepare and transform the input prompts for processing based on the destination shape.
`generate`	Perform image segmentation using the Segment Anything Model (SAM).
`get_im_features`	Extract image features using the SAM model's image encoder for subsequent mask prediction.
`get_model`	Retrieve or build the Segment Anything Model (SAM) for image segmentation tasks.
`inference`	Perform image segmentation inference based on the given input cues, using the currently loaded image.
`inference_features`	Perform prompts preprocessing and inference on provided image features using the SAM model.
`postprocess`	Post-process SAM's inference outputs to generate object detection masks and bounding boxes.
`pre_transform`	Perform initial transformations on the input image for preprocessing.
`preprocess`	Preprocess the input image for model inference.
`prompt_inference`	Perform image segmentation inference based on input cues using SAM's specialized architecture.
`remove_small_regions`	Remove small disconnected regions and holes from segmentation masks.
`reset_image`	Reset the current image and its features, clearing them for subsequent inference.
`set_image`	Preprocess and set a single image for inference.
`set_prompts`	Set prompts for subsequent inference operations.
`setup_model`	Initialize the Segment Anything Model (SAM) for inference.
`setup_source`	Set up the data source for SAM inference.

Examples

>>> predictor = Predictor()
>>> predictor.setup_model(model_path="sam_model.pt")
>>> predictor.set_image("image.jpg")
>>> bboxes = [[100, 100, 200, 200]]
>>> results = predictor(bboxes=bboxes)

Source code in ultralytics/models/sam/predict.py

Name	Type	Description	Default
`features`	`torch.Tensor`	Extracted image features with shape (B, C, H, W) from the SAM model image encoder.	required
`bboxes`	`np.ndarray \| list[list[float]] \| None`	Bounding boxes in XYXY format with shape (N, 4).	`None`
`points`	`np.ndarray \| list[list[float]] \| None`	Object location points with shape (N, 2), in pixels.	`None`
`labels`	`np.ndarray \| list[int] \| None`	Point prompt labels with shape (N,). 1 = foreground, 0 = background.	`None`
`masks`	`list[np.ndarray] \| np.ndarray \| None`	Masks for the objects, where each mask is a 2D array.	`None`
`multimask_output`	`bool`	Flag to return multiple masks for ambiguous prompts.	`False`

Type	Description
`pred_masks (torch.Tensor)`	Output masks with shape (C, H, W), where C is the number of generated masks.
`pred_scores (torch.Tensor)`	Quality scores for each mask, with length C.

Name	Type	Description	Default
`dst_shape`	`tuple[int, int]`	The target shape (height, width) for the prompts.	required
`src_shape`	`tuple[int, int]`	The source shape (height, width) of the input image.	required
`bboxes`	`np.ndarray \| list \| None`	Bounding boxes in XYXY format with shape (N, 4).	`None`
`points`	`np.ndarray \| list \| None`	Points indicating object locations with shape (N, 2) or (N, num_points, 2), in pixels.	`None`
`labels`	`np.ndarray \| list \| None`	Point prompt labels with shape (N) or (N, num_points). 1 for foreground, 0 for background.	`None`
`masks`	`list[np.ndarray] \| np.ndarray \| None`	Masks for the objects, where each mask is a 2D array with shape (H, W).	`None`

Type	Description
`bboxes (torch.Tensor \| None)`	Transformed bounding boxes.
`points (torch.Tensor \| None)`	Transformed points.
`labels (torch.Tensor \| None)`	Transformed labels.
`masks (torch.Tensor \| None)`	Transformed masks.

Name	Type	Description	Default
`im`	`torch.Tensor`	Input tensor representing the preprocessed image with shape (N, C, H, W).	required
`crop_n_layers`	`int`	Number of layers for additional mask predictions on image crops.	`0`
`crop_overlap_ratio`	`float`	Overlap between crops, scaled down in subsequent layers.	`512 / 1500`
`crop_downscale_factor`	`int`	Scaling factor for sampled points-per-side in each layer.	`1`
`point_grids`	`list[np.ndarray] \| None`	Custom grids for point sampling normalized to [0,1].	`None`
`points_stride`	`int`	Number of points to sample along each side of the image.	`32`
`points_batch_size`	`int`	Batch size for the number of points processed simultaneously.	`64`
`conf_thres`	`float`	Confidence threshold [0,1] for filtering based on mask quality prediction.	`0.88`
`stability_score_thresh`	`float`	Stability threshold [0,1] for mask filtering based on stability.	`0.95`
`stability_score_offset`	`float`	Offset value for calculating stability score.	`0.95`
`crop_nms_thresh`	`float`	IoU cutoff for NMS to remove duplicate masks between crops.	`0.7`

Type	Description
`pred_masks (torch.Tensor)`	Segmented masks with shape (N, H, W).
`pred_scores (torch.Tensor)`	Confidence scores for each mask with shape (N,).
`pred_bboxes (torch.Tensor)`	Bounding boxes for each mask with shape (N, 4).

Name	Type	Description	Default
`im`	`torch.Tensor`	The preprocessed input image in tensor format, with shape (N, C, H, W).	required
`bboxes`	`np.ndarray \| list \| None`	Bounding boxes with shape (N, 4), in XYXY format.	`None`
`points`	`np.ndarray \| list \| None`	Points indicating object locations with shape (N, 2), in pixels.	`None`
`labels`	`np.ndarray \| list \| None`	Labels for point prompts, shape (N,). 1 = foreground, 0 = background.	`None`
`masks`	`np.ndarray \| None`	Low-resolution masks from previous predictions, shape (N, H, W). For SAM H=W=256.	`None`
`multimask_output`	`bool`	Flag to return multiple masks. Helpful for ambiguous prompts.	`False`
`*args`	`Any`	Additional positional arguments.	required
`**kwargs`	`Any`	Additional keyword arguments.	required

Type	Description
`pred_masks (torch.Tensor)`	The output masks in shape (C, H, W), where C is the number of generated masks.
`pred_scores (torch.Tensor)`	An array of length C containing quality scores predicted by the model for each

Name	Type	Description	Default
`features`	`torch.Tensor \| dict[str, Any]`	Extracted image features from the SAM/SAM2 model image encoder.	required
`src_shape`	`tuple[int, int]`	The source shape (height, width) of the input image.	required
`dst_shape`	`tuple[int, int] \| None`	The target shape (height, width) for the prompts. If None, defaults to (imgsz, imgsz).	`None`
`bboxes`	`np.ndarray \| list[list[float]] \| None`	Bounding boxes in xyxy format with shape (N, 4).	`None`
`points`	`np.ndarray \| list[list[float]] \| None`	Points indicating object locations with shape (N, 2), in pixels.	`None`
`labels`	`np.ndarray \| list[int] \| None`	Point prompt labels with shape (N, ).	`None`
`masks`	`list[np.ndarray] \| np.ndarray \| None`	Masks for the objects, where each mask is a 2D array.	`None`
`multimask_output`	`bool`	Flag to return multiple masks for ambiguous prompts.	`False`

Name	Type	Description	Default
`preds`	`tuple`	The output from SAM model inference, containing: - pred_masks (torch.Tensor): Predicted masks with shape (N, 1, H, W). - pred_scores (torch.Tensor): Confidence scores for each mask with shape (N, 1). - pred_bboxes (torch.Tensor, optional): Predicted bounding boxes if segment_all is True.	required
`img`	`torch.Tensor`	The processed input image tensor with shape (C, H, W).	required
`orig_imgs`	`list[np.ndarray] \| torch.Tensor`	The original, unprocessed images.	required

Name	Type	Description	Default
`im`	`torch.Tensor`	Preprocessed input image tensor with shape (N, C, H, W).	required
`bboxes`	`np.ndarray \| list \| None`	Bounding boxes in XYXY format with shape (N, 4).	`None`
`points`	`np.ndarray \| list \| None`	Points indicating object locations with shape (N, 2) or (N, num_points, 2), in pixels.	`None`
`labels`	`np.ndarray \| list \| None`	Point prompt labels with shape (N) or (N, num_points). 1 for foreground, 0 for background.	`None`
`masks`	`np.ndarray \| None`	Low-res masks from previous predictions with shape (N, H, W). For SAM, H=W=256.	`None`
`multimask_output`	`bool`	Flag to return multiple masks for ambiguous prompts.	`False`

Name	Type	Description	Default
`masks`	`torch.Tensor`	Segmentation masks to be processed, with shape (N, H, W) where N is the number of masks, H is height, and W is width.	required
`min_area`	`int`	Minimum area threshold for removing disconnected regions and holes. Regions smaller than this will be removed.	`0`
`nms_thresh`	`float`	IoU threshold for the NMS algorithm to remove duplicate boxes.	`0.7`

Type	Description
`new_masks (torch.Tensor)`	Processed masks with small regions removed, shape (N, H, W).
`keep (list[int])`	Indices of remaining masks after NMS, for filtering corresponding boxes.

Name	Type	Description	Default
`model`	`torch.nn.Module \| None`	A pretrained SAM model. If None, a new model is built based on config.	`None`
`verbose`	`bool`	If True, prints selected device information.	`True`

Name	Type	Description
`_bb_feat_sizes`	`list[tuple]`	Feature sizes for different backbone levels.
`model`	`torch.nn.Module`	The loaded SAM2 model.
`device`	`torch.device`	The device (CPU or GPU) on which the model is loaded.
`features`	`dict`	Cached image features for efficient inference.
`segment_all`	`bool`	Flag to indicate if all segments should be predicted.
`prompts`	`dict[str, Any]`	Dictionary to store various types of prompts for inference.

Name	Type	Description
`inference_state`	`dict`	A dictionary to store the current state of inference operations.
`non_overlap_masks`	`bool`	A flag indicating whether masks should be non-overlapping.
`clear_non_cond_mem_around_input`	`bool`	A flag to control clearing non-conditional memory around inputs.
`clear_non_cond_mem_for_multi_obj`	`bool`	A flag to control clearing non-conditional memory for multi-object scenarios.
`callbacks`	`dict`	A dictionary of callbacks for various prediction lifecycle events.

Name	Description
`_add_output_per_object`	Split a multi-object output into per-object output slices and add them into Output_Dict_Per_Obj.
`_clear_non_cond_mem_around_input`	Remove the non-conditioning memory around the input frame.
`_consolidate_temp_output_across_obj`	Consolidate per-object temporary outputs into a single output for all objects.
`_get_empty_mask_ptr`	Get a dummy object pointer based on an empty mask on the current frame.
`_get_maskmem_pos_enc`	Cache and manage the positional encoding for mask memory across frames and objects.
`_init_state`	Initialize an inference state.
`_obj_id_to_idx`	Map client-side object id to model-side object index.
`_prune_non_cond_memory`	Prune old non-conditioning frames to bound memory usage.
`_reset_tracking_results`	Reset all tracking inputs and results across the videos.
`_run_memory_encoder`	Run the memory encoder on masks.
`_run_single_frame_inference`	Run tracking on a single frame based on current inputs and previous memory.
`add_new_prompts`	Add new points or masks to a specific frame for a given object ID.
`clear_all_points_in_frame`	Remove all input points or mask in a specific frame for a given object.
`clear_all_points_in_video`	Remove all input points or mask in all frames throughout the video.
`get_im_features`	Extract and process image features using SAM2's image encoder for subsequent segmentation tasks.
`get_model`	Retrieve and configure the model with binarization enabled.
`inference`	Perform image segmentation inference based on the given input cues, using the currently loaded image. This
`init_state`	Initialize an inference state for the predictor.
`postprocess`	Post-process the predictions to apply non-overlapping constraints if required.
`propagate_in_video_preflight`	Prepare inference_state and consolidate temporary outputs before tracking.
`remove_object`	Remove an object id from the tracking state. If strict is True, we check whether the object id actually

Name	Type	Description	Default
`frame_idx`	`int`	The index of the current frame.	required
`current_out`	`dict`	The current output dictionary containing multi-object outputs.	required
`storage_key`	`str`	The key used to store the output in the per-object output dictionary.	required
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Name	Type	Description	Default
`frame_idx`	`int`	The index of the current frame where user interaction occurred.	required
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Name	Type	Description	Default
`frame_idx`	`int`	The index of the frame for which to consolidate outputs.	required
`is_cond`	`bool, optional`	Indicates if the frame is considered a conditioning frame.	`False`
`run_mem_encoder`	`bool, optional`	Specifies whether to run the memory encoder after consolidating the outputs.	`False`
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Reference for ultralytics/models/sam/predict.py

class ultralytics.models.sam.predict.Predictor

method ultralytics.models.sam.predict.Predictor._inference_features

method ultralytics.models.sam.predict.Predictor._prepare_prompts

method ultralytics.models.sam.predict.Predictor.generate

method ultralytics.models.sam.predict.Predictor.get_im_features

method ultralytics.models.sam.predict.Predictor.get_model

method ultralytics.models.sam.predict.Predictor.inference

method ultralytics.models.sam.predict.Predictor.inference_features

method ultralytics.models.sam.predict.Predictor.postprocess

method ultralytics.models.sam.predict.Predictor.pre_transform

method ultralytics.models.sam.predict.Predictor.preprocess

method ultralytics.models.sam.predict.Predictor.prompt_inference

method ultralytics.models.sam.predict.Predictor.remove_small_regions

method ultralytics.models.sam.predict.Predictor.reset_image

method ultralytics.models.sam.predict.Predictor.set_image

method ultralytics.models.sam.predict.Predictor.set_prompts

method ultralytics.models.sam.predict.Predictor.setup_model

method ultralytics.models.sam.predict.Predictor.setup_source

class ultralytics.models.sam.predict.SAM2Predictor

method ultralytics.models.sam.predict.SAM2Predictor._inference_features

method ultralytics.models.sam.predict.SAM2Predictor._prepare_prompts

method ultralytics.models.sam.predict.SAM2Predictor.get_im_features

method ultralytics.models.sam.predict.SAM2Predictor.get_model

method ultralytics.models.sam.predict.SAM2Predictor.setup_source

class ultralytics.models.sam.predict.SAM2VideoPredictor

method ultralytics.models.sam.predict.SAM2VideoPredictor._add_output_per_object

method ultralytics.models.sam.predict.SAM2VideoPredictor._clear_non_cond_mem_around_input

method ultralytics.models.sam.predict.SAM2VideoPredictor._consolidate_temp_output_across_obj

method ultralytics.models.sam.predict.SAM2VideoPredictor._get_empty_mask_ptr

method ultralytics.models.sam.predict.SAM2VideoPredictor._get_maskmem_pos_enc

method ultralytics.models.sam.predict.SAM2VideoPredictor._init_state

method ultralytics.models.sam.predict.SAM2VideoPredictor._obj_id_to_idx

method ultralytics.models.sam.predict.SAM2VideoPredictor._prune_non_cond_memory

method ultralytics.models.sam.predict.SAM2VideoPredictor._reset_tracking_results

method ultralytics.models.sam.predict.SAM2VideoPredictor._run_memory_encoder

method ultralytics.models.sam.predict.SAM2VideoPredictor._run_single_frame_inference

method ultralytics.models.sam.predict.SAM2VideoPredictor.add_new_prompts

method ultralytics.models.sam.predict.SAM2VideoPredictor.clear_all_points_in_frame

method ultralytics.models.sam.predict.SAM2VideoPredictor.clear_all_points_in_video

method ultralytics.models.sam.predict.SAM2VideoPredictor.get_im_features

method ultralytics.models.sam.predict.SAM2VideoPredictor.get_model

method ultralytics.models.sam.predict.SAM2VideoPredictor.inference

method ultralytics.models.sam.predict.SAM2VideoPredictor.init_state

method ultralytics.models.sam.predict.SAM2VideoPredictor.postprocess

method ultralytics.models.sam.predict.SAM2VideoPredictor.propagate_in_video_preflight

method ultralytics.models.sam.predict.SAM2VideoPredictor.remove_object

class ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor

method ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor._obj_id_to_idx

method ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor._prepare_memory_conditioned_features

method ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.get_im_features

method ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.get_maskmem_enc

method ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.inference

method ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.track_step

method ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.update_memory

class ultralytics.models.sam.predict.SAM3Predictor

method ultralytics.models.sam.predict.SAM3Predictor.get_model

method ultralytics.models.sam.predict.SAM3Predictor.setup_model

class ultralytics.models.sam.predict.SAM3SemanticPredictor

method ultralytics.models.sam.predict.SAM3SemanticPredictor._get_dummy_prompt

method ultralytics.models.sam.predict.SAM3SemanticPredictor._inference_features

method ultralytics.models.sam.predict.SAM3SemanticPredictor._prepare_geometric_prompts

method ultralytics.models.sam.predict.SAM3SemanticPredictor.get_im_features

method ultralytics.models.sam.predict.SAM3SemanticPredictor.get_model

method ultralytics.models.sam.predict.SAM3SemanticPredictor.inference

method ultralytics.models.sam.predict.SAM3SemanticPredictor.inference_features

method ultralytics.models.sam.predict.SAM3SemanticPredictor.postprocess

method ultralytics.models.sam.predict.SAM3SemanticPredictor.pre_transform

method ultralytics.models.sam.predict.SAM3SemanticPredictor.reset_prompts

class ultralytics.models.sam.predict.SAM3VideoPredictor

method ultralytics.models.sam.predict.SAM3VideoPredictor.propagate_in_video

class ultralytics.models.sam.predict.SAM3VideoSemanticPredictor

method ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._apply_object_wise_non_overlapping_constraints

method ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._associate_det_trk

method ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._cache_backbone_features

method ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._det_track_one_frame

method ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._drop_new_det_with_obj_limit

method ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._extract_detection_outputs

method ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._initialize_metadata

method ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._process_hotstart

Reference for `ultralytics/models/sam/predict.py`

class `ultralytics.models.sam.predict.Predictor`

method `ultralytics.models.sam.predict.Predictor._inference_features`

method `ultralytics.models.sam.predict.Predictor._prepare_prompts`

method `ultralytics.models.sam.predict.Predictor.generate`

method `ultralytics.models.sam.predict.Predictor.get_im_features`

method `ultralytics.models.sam.predict.Predictor.get_model`

method `ultralytics.models.sam.predict.Predictor.inference`

method `ultralytics.models.sam.predict.Predictor.inference_features`

method `ultralytics.models.sam.predict.Predictor.postprocess`

method `ultralytics.models.sam.predict.Predictor.pre_transform`

method `ultralytics.models.sam.predict.Predictor.preprocess`

method `ultralytics.models.sam.predict.Predictor.prompt_inference`

method `ultralytics.models.sam.predict.Predictor.remove_small_regions`

method `ultralytics.models.sam.predict.Predictor.reset_image`

method `ultralytics.models.sam.predict.Predictor.set_image`

method `ultralytics.models.sam.predict.Predictor.set_prompts`

method `ultralytics.models.sam.predict.Predictor.setup_model`

method `ultralytics.models.sam.predict.Predictor.setup_source`

class `ultralytics.models.sam.predict.SAM2Predictor`

method `ultralytics.models.sam.predict.SAM2Predictor._inference_features`

method `ultralytics.models.sam.predict.SAM2Predictor._prepare_prompts`

method `ultralytics.models.sam.predict.SAM2Predictor.get_im_features`

method `ultralytics.models.sam.predict.SAM2Predictor.get_model`

method `ultralytics.models.sam.predict.SAM2Predictor.setup_source`

class `ultralytics.models.sam.predict.SAM2VideoPredictor`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._add_output_per_object`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._clear_non_cond_mem_around_input`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._consolidate_temp_output_across_obj`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._get_empty_mask_ptr`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._get_maskmem_pos_enc`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._init_state`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._obj_id_to_idx`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._prune_non_cond_memory`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._reset_tracking_results`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._run_memory_encoder`

method `ultralytics.models.sam.predict.SAM2VideoPredictor._run_single_frame_inference`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.add_new_prompts`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.clear_all_points_in_frame`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.clear_all_points_in_video`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.get_im_features`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.get_model`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.inference`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.init_state`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.postprocess`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.propagate_in_video_preflight`

method `ultralytics.models.sam.predict.SAM2VideoPredictor.remove_object`

class `ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor`

method `ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor._obj_id_to_idx`

method `ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor._prepare_memory_conditioned_features`

method `ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.get_im_features`

method `ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.get_maskmem_enc`

method `ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.inference`

method `ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.track_step`

method `ultralytics.models.sam.predict.SAM2DynamicInteractivePredictor.update_memory`

class `ultralytics.models.sam.predict.SAM3Predictor`

method `ultralytics.models.sam.predict.SAM3Predictor.get_model`

method `ultralytics.models.sam.predict.SAM3Predictor.setup_model`

class `ultralytics.models.sam.predict.SAM3SemanticPredictor`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor._get_dummy_prompt`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor._inference_features`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor._prepare_geometric_prompts`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor.get_im_features`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor.get_model`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor.inference`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor.inference_features`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor.postprocess`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor.pre_transform`

method `ultralytics.models.sam.predict.SAM3SemanticPredictor.reset_prompts`

class `ultralytics.models.sam.predict.SAM3VideoPredictor`

method `ultralytics.models.sam.predict.SAM3VideoPredictor.propagate_in_video`

class `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor`

method `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._apply_object_wise_non_overlapping_constraints`

method `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._associate_det_trk`

method `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._cache_backbone_features`

method `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._det_track_one_frame`

method `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._drop_new_det_with_obj_limit`

method `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._extract_detection_outputs`

method `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._initialize_metadata`

method `ultralytics.models.sam.predict.SAM3VideoSemanticPredictor._process_hotstart`

Name	Type	Description	Default
`frame_idx`	`int`	The index of the current frame for which to generate the dummy object pointer.	required
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Name	Type	Description	Default
`out_maskmem_pos_enc`	`list[torch.Tensor] \| None`	The positional encoding for mask memory. Should be a list of tensors or None.	required
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Name	Type	Description	Default
`obj_id`	`int`	The unique identifier of the object provided by the client side.	required
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Name	Type	Description	Default
`batch_size`	`int`	The batch size for processing the frame.	required
`high_res_masks`	`torch.Tensor`	High-resolution masks for which to compute the memory.	required
`object_score_logits`	`torch.Tensor`	Logits representing the object scores.	required
`is_mask_from_pts`	`bool`	Indicates if the mask is derived from point interactions.	required
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Type	Description
`maskmem_features (torch.Tensor)`	The encoded mask features.
`maskmem_pos_enc (torch.Tensor)`	The positional encoding.

Name	Type	Description	Default
`output_dict`	`dict`	The dictionary containing the output states of the tracking process.	required
`frame_idx`	`int`	The index of the current frame.	required
`batch_size`	`int`	The batch size for processing the frame.	required
`is_init_cond_frame`	`bool`	Indicates if the current frame is an initial conditioning frame.	required
`point_inputs`	`dict \| None`	Input points and their labels.	required
`mask_inputs`	`torch.Tensor \| None`	Input binary masks.	required
`reverse`	`bool`	Indicates if the tracking should be performed in reverse order.	required
`run_mem_encoder`	`bool`	Indicates if the memory encoder should be executed.	required
`prev_sam_mask_logits`	`torch.Tensor \| None`	Previous mask logits for the current object.	`None`
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Name	Type	Description	Default
`obj_id`	`int`	The ID of the object to which the prompts are associated.	required
`points`	`torch.Tensor, optional`	The coordinates of the points of interest.	`None`
`labels`	`torch.Tensor, optional`	The labels corresponding to the points.	`None`
`masks`	`torch.Tensor, optional`	Binary masks for the object.	`None`
`frame_idx`	`int, optional`	The index of the frame to which the prompts are applied.	`0`
`inference_state`	`dict[str, Any], optional`	The current inference state. If None, uses the instance's inference state.	`None`

Type	Description
`pred_masks (torch.Tensor)`	The flattened predicted masks.
`pred_scores (torch.Tensor)`	A tensor of ones indicating the number of objects.

Name	Type	Description	Default
`im`	`torch.Tensor`	The input image tensor.	required
`batch`	`int, optional`	The batch size for expanding features if there are multiple prompts.	`1`

Type	Description
`vis_feats (torch.Tensor)`	The visual features extracted from the image.
`vis_pos_embed (torch.Tensor)`	The positional embeddings for the visual features.
`feat_sizes (list[tuple])`	A list containing the sizes of the extracted features.