Reference for ultralytics/nn/backends/qnn.py
Improvements
This page is sourced from https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/qnn.py. Have an improvement or example to add? Open a Pull Request — thank you! 🙏
Summary
ultralytics.nn.backends.qnn.QNNBackend
QNNBackend()Bases: BaseBackend
Qualcomm QNN inference backend for Snapdragon hardware.
Loads and runs the QNN context binary produced by the Ultralytics QNN export (an *_qnn.onnx file inside a _qnn_model directory) using ONNX Runtime with the QNN Execution Provider plugin (onnxruntime-qnn). Inference runs on Qualcomm Snapdragon devices (Android, Windows on Snapdragon, or Qualcomm Linux boards) via the HTP (NPU) backend.
Methods
| Name | Description |
|---|---|
forward | Run inference on the Qualcomm QNN runtime. |
load_model | Load a QNN context-binary model with ONNX Runtime's QNN Execution Provider plugin. |
ultralytics.nn.backends.qnn.QNNBackend.forward
def forward(self, im: torch.Tensor) -> listRun inference on the Qualcomm QNN runtime.
Args
| Name | Type | Description | Default |
|---|---|---|---|
im | torch.Tensor | Input image tensor in BCHW format, normalized to [0, 1]. | required |
Returns
| Type | Description |
|---|---|
list | Model predictions as a list of output arrays. |
Source code in ultralytics/nn/backends/qnn.py
def forward(self, im: torch.Tensor) -> list:
"""Run inference on the Qualcomm QNN runtime.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list): Model predictions as a list of output arrays.
"""
return self.session.run(self.output_names, {self.session.get_inputs()[0].name: im.cpu().numpy()}) ultralytics.nn.backends.qnn.QNNBackend.load_model
def load_model(self, weight: str | Path) -> NoneLoad a QNN context-binary model with ONNX Runtime's QNN Execution Provider plugin.
Args
| Name | Type | Description | Default |
|---|---|---|---|
weight | `str | Path` | Path to the *_qnn.onnx file or the _qnn_model directory containing it. |
Raises
| Type | Description |
|---|---|
OSError | If the QNN Execution Provider cannot be registered (e.g. not on Snapdragon hardware). |
Source code in ultralytics/nn/backends/qnn.py
def load_model(self, weight: str | Path) -> None:
"""Load a QNN context-binary model with ONNX Runtime's QNN Execution Provider plugin.
Args:
weight (str | Path): Path to the `*_qnn.onnx` file or the `_qnn_model` directory containing it.
Raises:
OSError: If the QNN Execution Provider cannot be registered (e.g. not on Snapdragon hardware).
"""
check_requirements("onnxruntime-qnn")
import onnxruntime
from ultralytics.utils.export.qnn import qnn_library_paths
w = Path(weight)
onnx_file = w if w.is_file() else next(w.rglob("*_qnn.onnx"))
LOGGER.info(f"Loading {onnx_file} for Qualcomm QNN inference...")
# Register the QNN EP (libraries resolved from the plugin helper or the onnxruntime/capi bundle) and select it
ep_name = "QNNExecutionProvider"
ep_library, htp_backend = qnn_library_paths()
onnxruntime.register_execution_provider_library(ep_name, ep_library)
devices = [d for d in onnxruntime.get_ep_devices() if d.ep_name == ep_name]
if not devices:
raise OSError(
"QNN Execution Provider registered but no QNN devices were found. Run on a Qualcomm Snapdragon device "
"with 'onnxruntime-qnn' installed."
)
options = onnxruntime.SessionOptions()
options.add_provider_for_devices(devices, {"backend_path": htp_backend})
self.session = onnxruntime.InferenceSession(str(onnx_file), sess_options=options)
self.output_names = [x.name for x in self.session.get_outputs()]
# Load metadata saved alongside the model during export
metadata_file = onnx_file.parent / "metadata.yaml"
if metadata_file.exists():
self.apply_metadata(YAML.load(metadata_file))