Link to this sectionUltralytics Inference for Rust#

Q: Is video supported?

Yes, with the video feature enabled and FFmpeg 7+ installed on the system. This covers video files, webcams, and RTSP/RTMP/HTTP streams.

Q: What do the annotate and visualize features do?

Both are enabled by default. annotate draws boxes, masks, keypoints, and class labels onto the image and is required for --save to write annotated results. visualize opens a live window for --show. For a smaller, headless build that only returns results programmatically, disable them with cargo build --no-default-features (add back individual features as needed).

Ultralytics Inference is a high-performance YOLO inference library and command-line tool written in Rust. It runs exported ONNX models through ONNX Runtime to deliver fast, memory-safe predictions on images, videos, webcams, and streams, with no Python runtime required at inference time.

The project ships as a single crate, ultralytics-inference, that you can use two ways: as a CLI for quick predictions and batch jobs, or as a library embedded directly in your Rust application. It supports every Ultralytics task and a broad set of hardware backends through a uniform device interface.

Link to this sectionWhy Rust inference?#

Native speed and a small footprint. Compiles to a native binary with no interpreter, ideal for servers, containers, and edge devices.
Memory safety. Rust's ownership model removes whole classes of runtime errors without a garbage collector.
All YOLO tasks. Detect, segment, pose, OBB, classify, semantic segmentation, and depth estimation from one API.
Broad hardware support. CPU plus CUDA, TensorRT, CoreML, OpenVINO, DirectML, ROCm, and XNNPACK execution providers selected at build time.
GPU-side preprocessing. An optional fused CUDA kernel keeps letterbox, normalize, and layout conversion on the device for a zero-copy input path.
Auto-download. Known YOLO model names and sample assets download automatically on first use.

Looking for the Python package?

This page covers the standalone Rust crate. For the Python workflow (training, validation, export, and prediction) see the main Quickstart and Predict mode. Export any Ultralytics model to ONNX with the ONNX integration, then run it here.

Link to this sectionInstallation#

Rust 1.89 or newer is required. The video feature additionally needs FFmpeg 7+ installed on the system.

# Install the command-line tool from crates.io
cargo install ultralytics-inference

# Or with GPU support compiled in
cargo install ultralytics-inference --features cuda,tensorrt

The binary is placed at ~/.cargo/bin/ultralytics-inference (Linux and macOS) or %USERPROFILE%\.cargo\bin\ on Windows.

Link to this sectionCLI quickstart#

The CLI exposes a predict subcommand. With no arguments it downloads a nano detection model and sample images, runs inference, and saves the annotated results to runs/detect/predict.

# Detect on the built-in samples (downloads model and images)
ultralytics-inference predict

# Detect on your own image
ultralytics-inference predict --model yolo26n.onnx --source image.jpg

# Segmentation (auto-downloads yolo26n-seg.onnx)
ultralytics-inference predict --task segment --source image.jpg

# Pose on a video, shown live in a window
ultralytics-inference predict --task pose --source video.mp4 --show

# Depth estimation (auto-downloads yolo26n-depth.onnx)
ultralytics-inference predict --task depth --source image.jpg

# Depth with the DepthAnything-style palette (disparity is already the default)
ultralytics-inference predict --task depth --source image.jpg --colormap spectral

# Depth normalized by depth value instead (near = low color, far = high color)
ultralytics-inference predict --task depth --source image.jpg --depth-viz metric

# Tune thresholds and filter to specific classes
ultralytics-inference predict --source image.jpg --conf 0.5 --iou 0.45 --classes "0,1,2"

# Run a whole folder on the GPU in half precision
ultralytics-inference predict --source images/ --device cuda:0 --half

Common flags:

Flag	Default	Description
`--model`, `-m`	`yolo26n.onnx`	Path to an ONNX model; a known YOLO name is downloaded automatically.
`--task`	`detect`	One of `detect`, `segment`, `pose`, `obb`, `classify`, `semantic`, `depth`.
`--source`, `-s`	sample	Image, directory, glob, video, webcam index, or URL.
`--conf`	`0.25`	Confidence threshold.
`--iou`	`0.7`	IoU threshold for non-maximum suppression.
`--imgsz`	model metadata	Inference image size.
`--device`	`cpu`	Execution device, for example `cuda:0`, `coreml`, `tensorrt:0`.
`--half`	`false`	FP16 half-precision inference.
`--save`	`true`	Save annotated results to `runs/<task>/predict`.
`--show`	`false`	Display results in a window.
`--classes`	all	Filter detections by class IDs, for example `"0,1,2"`.
`--colormap`	`jet`	Depth colormap: `jet`, `inferno`, `spectral`, or `gray` (depth only).
`--depth-viz`	`disparity`	Depth normalization: `disparity` (inverse depth, near = high color) or `metric` (depth only).

Link to this sectionLibrary quickstart#

Load a model and run a prediction. Model metadata such as class names, task type, and image size is read automatically from the ONNX file.

use ultralytics_inference::YOLOModel;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Metadata (classes, task, imgsz) is parsed from the model.
    let mut model = YOLOModel::load("yolo26n.onnx")?;

    let results = model.predict("image.jpg")?;

    for result in &results {
        if let Some(boxes) = &result.boxes {
            for i in 0..boxes.len() {
                let class_id = boxes.cls()[i] as usize;
                let conf = boxes.conf()[i];
                let name = result.names.get(&class_id).map_or("unknown", |s| s.as_str());
                println!("{name} {conf:.2}");
            }
        }
    }

    Ok(())
}

Use InferenceConfig to control thresholds, image size, precision, and device with a builder API:

use ultralytics_inference::{Device, InferenceConfig, YOLOModel};

let config = InferenceConfig::new()
    .with_confidence(0.5)
    .with_iou(0.45)
    .with_imgsz(640, 640)
    .with_device(Device::Cuda(0))
    .with_half(true);

let mut model = YOLOModel::load_with_config("yolo26n.onnx", config)?;
let results = model.predict("image.jpg")?;

Each task populates a different field on Results. Each tab below is a complete, runnable program; the model and sample inputs download automatically on first run. Swap predict_default() for predict("image.jpg") to run on your own files.

use ultralytics_inference::YOLOModel;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut model = YOLOModel::load("yolo26n.onnx")?;
    let results = model.predict_default()?;

    for result in &results {
        if let Some(boxes) = &result.boxes {
            println!("{} detections", boxes.len());
            let xyxy = boxes.xyxy(); // rows of [x1, y1, x2, y2]
            for i in 0..boxes.len() {
                let class_id = boxes.cls()[i] as usize;
                let name = result.names.get(&class_id).map_or("unknown", |s| s.as_str());
                println!("  {name} {:.2} {:?}", boxes.conf()[i], xyxy.row(i).to_vec());
            }
        }
    }

    Ok(())
}

Link to this sectionSupported tasks#

All Ultralytics tasks are supported. When --model is omitted, the matching nano model for the selected task is downloaded automatically.

Task	`--task`	Output	Default model
Detection	`detect`	Bounding boxes and classes	`yolo26n.onnx`
Instance segmentation	`segment`	Boxes plus per-instance masks	`yolo26n-seg.onnx`
Pose	`pose`	Boxes plus keypoints	`yolo26n-pose.onnx`
Oriented boxes	`obb`	Rotated bounding boxes	`yolo26n-obb.onnx`
Classification	`classify`	Class probabilities	`yolo26n-cls.onnx`
Semantic segmentation	`semantic`	Per-pixel class map	`yolo26n-sem.onnx`
Depth estimation	`depth`	Per-pixel depth map in meters	`yolo26n-depth.onnx`

Link to this sectionModel compatibility#

Any Ultralytics model exported to ONNX can be loaded from a local file. Auto-download is available for standard YOLO26, YOLO11, and YOLOv8 model names in sizes n, s, m, l, and x:

Model family	Auto-downloadable variants
YOLO26	`yolo26{n,s,m,l,x}.onnx`, `-seg`, `-pose`, `-obb`, `-cls`, `-sem`, and `-depth`
YOLO11	`yolo11{n,s,m,l,x}.onnx`, `-seg`, `-pose`, `-obb`, and `-cls`
YOLOv8	`yolov8{n,s,m,l,x}.onnx`, `-seg`, `-pose`, `-obb`, and `-cls`

Semantic segmentation (-sem) and depth estimation (-depth) are YOLO26-only.

Link to this sectionInput sources#

The --source argument (and the Source type in the library) accepts many input kinds, auto-detected from the string:

Source	Example	Notes
Image	`image.jpg`	Single file.
Directory	`images/`	All images in the folder.
Glob	`images/*.jpg`	Shell-style pattern.
Video	`video.mp4`	Requires the `video` feature.
Webcam	`0`	Requires the `video` feature.
Stream	`rtsp://...`	Requires the `video` feature.
URL	`https://example.com/image.jpg`	Remote image download.

Link to this sectionDevices and execution providers#

Inference runs on CPU by default. GPU and accelerator backends are compiled in as Cargo features and selected at runtime with --device (CLI) or Device (library).

Device string	`Device` variant	Build feature	Hardware
`cpu`	`Device::Cpu`	built in	Any CPU
`cuda:0`	`Device::Cuda(0)`	`cuda`	NVIDIA GPU
`tensorrt:0`	`Device::TensorRt(0)`	`tensorrt`	NVIDIA GPU, optimized
`coreml`	`Device::CoreMl`	`coreml`	Apple Silicon / macOS
`openvino`	`Device::OpenVino`	`openvino`	Intel CPU / iGPU
`directml:0`	`Device::DirectMl(0)`	`directml`	Windows GPU
`rocm:0`	`Device::Rocm(0)`	`rocm`	AMD GPU
`xnnpack`	`Device::Xnnpack`	`xnnpack`	Optimized CPU

# Build the CLI with the providers you need
cargo install ultralytics-inference --features cuda,tensorrt

Link to this sectionGPU acceleration and CUDA preprocessing#

On NVIDIA hardware, the cuda feature enables the CUDA execution provider, and tensorrt adds the TensorRT provider for further optimization. For the lowest possible latency, the cuda-preprocess feature moves preprocessing onto the GPU.

cuda-preprocess runs letterbox resizing, normalization, and the HWC-to-CHW layout conversion as a single fused CUDA kernel, then feeds the result to the model as a zero-copy device tensor. This removes the per-image CPU preprocessing cost and the host-to-device copy, which matters most for high-throughput batches and real-time streams.

# Build with fused GPU preprocessing (implies cuda + tensorrt)
cargo build --release --features cuda-preprocess

The fast path is used automatically, with no API change, when all of the following hold: the feature is compiled in, the device is CUDA or TensorRT, the task is detect, segment, pose, OBB, semantic segmentation, or depth estimation, and the model uses FP32 input. It is enabled by default and can be turned off per model:

use ultralytics_inference::{Device, InferenceConfig};

let config = InferenceConfig::new()
    .with_device(Device::TensorRt(0))
    .with_cuda_preprocess(false); // force CPU preprocessing

Match your CUDA toolkit

cuda-preprocess requires a matching CUDA toolkit at build time and uses NVRTC at runtime for the fused preprocessing kernel. See the CUDA and TensorRT acceleration guide for version requirements and troubleshooting.

Link to this sectionCargo features#

Features are enabled at build time. The defaults cover annotation and live display.

Feature	Default	Purpose
`annotate`	yes	Draw boxes, masks, keypoints, and labels; required for `--save`.
`visualize`	yes	Real-time window display for `--show`.
`video`	no	Read and write video files (requires FFmpeg 7+).
`cuda`	no	NVIDIA CUDA execution provider.
`tensorrt`	no	NVIDIA TensorRT execution provider.
`cuda-preprocess`	no	Fused GPU preprocessing with zero-copy input (implies cuda, tensorrt).
`coreml`	no	Apple CoreML execution provider.
`openvino`	no	Intel OpenVINO execution provider.
`rocm`	no	AMD ROCm execution provider.
`directml`	no	Windows DirectML execution provider.

Convenience groups bundle related providers: nvidia (cuda, tensorrt), amd (rocm, migraphx), intel (openvino, onednn), mobile (nnapi, coreml, qnn), and all (annotate, visualize, video). Additional providers such as nnapi, qnn, xnnpack, webgpu, and others are also available.

Enable features when installing the CLI or adding the library:

cargo install ultralytics-inference --features video
cargo install ultralytics-inference --features cuda,tensorrt

[dependencies]
ultralytics-inference = { version = "0.0.29", features = ["video"] }

Link to this sectionOutput and saving#

By default, predictions are annotated and saved to an auto-incrementing run directory:

runs/
└── detect/
    └── predict/          # then predict2, predict3, ...
        └── image.jpg     # annotated result

The subfolder matches the task (runs/segment/, runs/pose/, and so on). For video sources the annotated output is written as a video file; pass --save-frames to write individual frames instead. For the semantic task, --save-json writes per-pixel class-map PNGs under a results/ subfolder. For the depth task the annotated image is written side by side, with the original next to the colorized depth map. Annotated image and video saving require the annotate feature; semantic class-map PNG export does not. Video input and output require the video feature.

Link to this sectionFAQ#

Link to this sectionDo I need Python installed?#

No. The crate runs exported ONNX models directly through ONNX Runtime. Python is only needed if you train or export models with the Ultralytics package beforehand.

Link to this sectionWhich models can I run?#

Any Ultralytics YOLO model exported to ONNX, including YOLO26, YOLO11, and YOLOv8. Known model names download automatically; you can also point --model at any local .onnx file.

Link to this sectionHow do I get a model file?#

Export from the Python package, for example with the ONNX integration, or let the CLI download a standard nano model for the chosen task on first run.

Link to this sectionIs video supported?#

Yes, with the video feature enabled and FFmpeg 7+ installed on the system. This covers video files, webcams, and RTSP/RTMP/HTTP streams.

Link to this sectionWhat do the `annotate` and `visualize` features do?#

Both are enabled by default. annotate draws boxes, masks, keypoints, and class labels onto the image and is required for --save to write annotated results. visualize opens a live window for --show. For a smaller, headless build that only returns results programmatically, disable them with cargo build --no-default-features (add back individual features as needed).

Link to this sectionWhere is the full API reference?#

This page is a high-level overview. The complete, type-by-type API reference for every public struct, method, and configuration option is published on docs.rs, generated directly from the source.

Contributors

ONonuralpszr⁸ GLglenn-jocher¹

Created 1 month agoUpdated 4 days ago