Link to this sectionReference for ultralytics/nn/backends/triton.py#
Improvements
This page is sourced from https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/triton.py. Have an improvement or example to add? Open a Pull Request — thank you! 🙏
Summary
Link to this section ultralytics.nn.backends.triton.TritonBackend#
TritonBackend()Bases: BaseBackend
NVIDIA Triton Inference Server backend for remote model serving.
Connects to and runs inference with models hosted on an NVIDIA Triton Inference Server instance via HTTP or gRPC protocols. The model is specified using a triton:// URL scheme.
Methods
| Name | Description |
|---|---|
forward | Run inference via the NVIDIA Triton Inference Server. |
load_model | Connect to a remote model on an NVIDIA Triton Inference Server. |
Link to this section ultralytics.nn.backends.triton.TritonBackend.forward#
def forward(self, im: torch.Tensor) -> listRun inference via the NVIDIA Triton Inference Server.
Args
| Name | Type | Description | Default |
|---|---|---|---|
im | torch.Tensor | Input image tensor in BCHW format, normalized to [0, 1]. | required |
Returns
| Type | Description |
|---|---|
list | Model predictions as a list of numpy arrays from the Triton server. |
Source code in ultralytics/nn/backends/triton.py
def forward(self, im: torch.Tensor) -> list:
"""Run inference via the NVIDIA Triton Inference Server.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list): Model predictions as a list of numpy arrays from the Triton server.
"""
return self.model(im.cpu().numpy())Link to this section ultralytics.nn.backends.triton.TritonBackend.load_model#
def load_model(self, weight: str | Path) -> NoneConnect to a remote model on an NVIDIA Triton Inference Server.
Args
| Name | Type | Description | Default |
|---|---|---|---|
weight | `str | Path` | Triton model URL (e.g., 'triton://host:8000/model_name'). |
Source code in ultralytics/nn/backends/triton.py
def load_model(self, weight: str | Path) -> None:
"""Connect to a remote model on an NVIDIA Triton Inference Server.
Args:
weight (str | Path): Triton model URL (e.g., 'triton://host:8000/model_name').
"""
check_requirements("tritonclient[all]")
from ultralytics.utils.triton import TritonRemoteModel
self.model = TritonRemoteModel(weight)
# Copy metadata from Triton model
if hasattr(self.model, "metadata"):
self.apply_metadata(self.model.metadata)