Meet YOLO26: next-gen vision AI.

Link to this sectionUltralytics Inference for Rust#

GitHub Crates.io docs.rs Downloads MSRV

Ultralytics Inference is a high-performance YOLO inference library and command-line tool written in Rust. It runs exported ONNX models through ONNX Runtime to deliver fast, memory-safe predictions on images, videos, webcams, and streams, with no Python runtime required at inference time.

The project ships as a single crate, ultralytics-inference, that you can use two ways: as a CLI for quick predictions and batch jobs, or as a library embedded directly in your Rust application. It supports every Ultralytics task and a broad set of hardware backends through a uniform device interface.

Link to this sectionWhy Rust inference?#

  • Native speed and a small footprint. Compiles to a native binary with no interpreter, ideal for servers, containers, and edge devices.
  • Memory safety. Rust's ownership model removes whole classes of runtime errors without a garbage collector.
  • All YOLO tasks. Detect, segment, pose, OBB, classify, and semantic segmentation from one API.
  • Broad hardware support. CPU plus CUDA, TensorRT, CoreML, OpenVINO, DirectML, ROCm, and XNNPACK execution providers selected at build time.
  • GPU-side preprocessing. An optional fused CUDA kernel keeps letterbox, normalize, and layout conversion on the device for a zero-copy input path.
  • Auto-download. Known YOLO model names and sample assets download automatically on first use.
Looking for the Python package?

This page covers the standalone Rust crate. For the Python workflow (training, validation, export, and prediction) see the main Quickstart and Predict mode. Export any Ultralytics model to ONNX with the ONNX integration, then run it here.

Link to this sectionInstallation#

Rust 1.89 or newer is required. The video feature additionally needs FFmpeg 7+ installed on the system.

# Install the command-line tool from crates.io
cargo install ultralytics-inference

# Or with GPU support compiled in
cargo install ultralytics-inference --features cuda,tensorrt

The binary is placed at ~/.cargo/bin/ultralytics-inference (Linux and macOS) or %USERPROFILE%\.cargo\bin\ on Windows.

Link to this sectionCLI quickstart#

The CLI exposes a predict subcommand. With no arguments it downloads a nano detection model and sample images, runs inference, and saves the annotated results to runs/detect/predict.

# Detect on the built-in samples (downloads model and images)
ultralytics-inference predict

# Detect on your own image
ultralytics-inference predict --model yolo26n.onnx --source image.jpg

# Segmentation (auto-downloads yolo26n-seg.onnx)
ultralytics-inference predict --task segment --source image.jpg

# Pose on a video, shown live in a window
ultralytics-inference predict --task pose --source video.mp4 --show

# Tune thresholds and filter to specific classes
ultralytics-inference predict --source image.jpg --conf 0.5 --iou 0.45 --classes "0,1,2"

# Run a whole folder on the GPU in half precision
ultralytics-inference predict --source images/ --device cuda:0 --half

Common flags:

FlagDefaultDescription
--model, -myolo26n.onnxPath to an ONNX model; a known YOLO name is downloaded automatically.
--taskdetectOne of detect, segment, pose, obb, classify, semantic.
--source, -ssampleImage, directory, glob, video, webcam index, or URL.
--conf0.25Confidence threshold.
--iou0.7IoU threshold for non-maximum suppression.
--imgszmodel metadataInference image size.
--devicecpuExecution device, for example cuda:0, coreml, tensorrt:0.
--halffalseFP16 half-precision inference.
--savetrueSave annotated results to runs/<task>/predict.
--showfalseDisplay results in a window.
--classesallFilter detections by class IDs, for example "0,1,2".

Link to this sectionLibrary quickstart#

Load a model and run a prediction. Model metadata such as class names, task type, and image size is read automatically from the ONNX file.

use ultralytics_inference::YOLOModel;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Metadata (classes, task, imgsz) is parsed from the model.
    let mut model = YOLOModel::load("yolo26n.onnx")?;

    let results = model.predict("image.jpg")?;

    for result in &results {
        if let Some(boxes) = &result.boxes {
            for i in 0..boxes.len() {
                let class_id = boxes.cls()[i] as usize;
                let conf = boxes.conf()[i];
                let name = result.names.get(&class_id).map_or("unknown", |s| s.as_str());
                println!("{name} {conf:.2}");
            }
        }
    }

    Ok(())
}

Use InferenceConfig to control thresholds, image size, precision, and device with a builder API:

use ultralytics_inference::{Device, InferenceConfig, YOLOModel};

let config = InferenceConfig::new()
    .with_confidence(0.5)
    .with_iou(0.45)
    .with_imgsz(640, 640)
    .with_device(Device::Cuda(0))
    .with_half(true);

let mut model = YOLOModel::load_with_config("yolo26n.onnx", config)?;
let results = model.predict("image.jpg")?;

Each task populates a different field on Results. Each tab below is a complete, runnable program; the model and sample inputs download automatically on first run. Swap predict_default() for predict("image.jpg") to run on your own files.

use ultralytics_inference::YOLOModel;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut model = YOLOModel::load("yolo26n.onnx")?;
    let results = model.predict_default()?;

    for result in &results {
        if let Some(boxes) = &result.boxes {
            println!("{} detections", boxes.len());
            let xyxy = boxes.xyxy(); // rows of [x1, y1, x2, y2]
            for i in 0..boxes.len() {
                let class_id = boxes.cls()[i] as usize;
                let name = result.names.get(&class_id).map_or("unknown", |s| s.as_str());
                println!("  {name} {:.2} {:?}", boxes.conf()[i], xyxy.row(i).to_vec());
            }
        }
    }

    Ok(())
}

Link to this sectionSupported tasks#

All Ultralytics tasks are supported. When --model is omitted, the matching nano model for the selected task is downloaded automatically.

Task--taskOutputDefault model
DetectiondetectBounding boxes and classesyolo26n.onnx
Instance segmentationsegmentBoxes plus per-instance masksyolo26n-seg.onnx
PoseposeBoxes plus keypointsyolo26n-pose.onnx
Oriented boxesobbRotated bounding boxesyolo26n-obb.onnx
ClassificationclassifyClass probabilitiesyolo26n-cls.onnx
Semantic segmentationsemanticPer-pixel class mapyolo26n-sem.onnx

Link to this sectionModel compatibility#

Any Ultralytics model exported to ONNX can be loaded from a local file. Auto-download is available for standard YOLO26, YOLO11, and YOLOv8 model names in sizes n, s, m, l, and x:

Model familyAuto-downloadable variants
YOLO26yolo26{n,s,m,l,x}.onnx, -seg, -pose, -obb, -cls, and -sem
YOLO11yolo11{n,s,m,l,x}.onnx, -seg, -pose, -obb, and -cls
YOLOv8yolov8{n,s,m,l,x}.onnx, -seg, -pose, -obb, and -cls

Semantic segmentation (-sem) is YOLO26-only.

Link to this sectionInput sources#

The --source argument (and the Source type in the library) accepts many input kinds, auto-detected from the string:

SourceExampleNotes
Imageimage.jpgSingle file.
Directoryimages/All images in the folder.
Globimages/*.jpgShell-style pattern.
Videovideo.mp4Requires the video feature.
Webcam0Requires the video feature.
Streamrtsp://...Requires the video feature.
URLhttps://example.com/image.jpgRemote image download.

Link to this sectionDevices and execution providers#

Inference runs on CPU by default. GPU and accelerator backends are compiled in as Cargo features and selected at runtime with --device (CLI) or Device (library).

Device stringDevice variantBuild featureHardware
cpuDevice::Cpubuilt inAny CPU
cuda:0Device::Cuda(0)cudaNVIDIA GPU
tensorrt:0Device::TensorRt(0)tensorrtNVIDIA GPU, optimized
coremlDevice::CoreMlcoremlApple Silicon / macOS
openvinoDevice::OpenVinoopenvinoIntel CPU / iGPU
directml:0Device::DirectMl(0)directmlWindows GPU
rocm:0Device::Rocm(0)rocmAMD GPU
xnnpackDevice::XnnpackxnnpackOptimized CPU
# Build the CLI with the providers you need
cargo install ultralytics-inference --features cuda,tensorrt

Link to this sectionGPU acceleration and CUDA preprocessing#

On NVIDIA hardware, the cuda feature enables the CUDA execution provider, and tensorrt adds the TensorRT provider for further optimization. For the lowest possible latency, the cuda-preprocess feature moves preprocessing onto the GPU.

cuda-preprocess runs letterbox resizing, normalization, and the HWC-to-CHW layout conversion as a single fused CUDA kernel, then feeds the result to the model as a zero-copy device tensor. This removes the per-image CPU preprocessing cost and the host-to-device copy, which matters most for high-throughput batches and real-time streams.

# Build with fused GPU preprocessing (implies cuda + tensorrt)
cargo build --release --features cuda-preprocess

The fast path is used automatically, with no API change, when all of the following hold: the feature is compiled in, the device is CUDA or TensorRT, the task is detect, segment, pose, OBB, or semantic segmentation, and the model uses FP32 input. It is enabled by default and can be turned off per model:

use ultralytics_inference::{Device, InferenceConfig};

let config = InferenceConfig::new()
    .with_device(Device::TensorRt(0))
    .with_cuda_preprocess(false); // force CPU preprocessing
Match your CUDA toolkit

cuda-preprocess requires a matching CUDA toolkit at build time and uses NVRTC at runtime for the fused preprocessing kernel. See the CUDA and TensorRT acceleration guide for version requirements and troubleshooting.

Link to this sectionCargo features#

Features are enabled at build time. The defaults cover annotation and live display.

FeatureDefaultPurpose
annotateyesDraw boxes, masks, keypoints, and labels; required for --save.
visualizeyesReal-time window display for --show.
videonoRead and write video files (requires FFmpeg 7+).
cudanoNVIDIA CUDA execution provider.
tensorrtnoNVIDIA TensorRT execution provider.
cuda-preprocessnoFused GPU preprocessing with zero-copy input (implies cuda, tensorrt).
coremlnoApple CoreML execution provider.
openvinonoIntel OpenVINO execution provider.
rocmnoAMD ROCm execution provider.
directmlnoWindows DirectML execution provider.

Convenience groups bundle related providers: nvidia (cuda, tensorrt), amd (rocm, migraphx), intel (openvino, onednn), mobile (nnapi, coreml, qnn), and all (annotate, visualize, video). Additional providers such as nnapi, qnn, xnnpack, webgpu, and others are also available.

Enable features when installing the CLI or adding the library:

cargo install ultralytics-inference --features video
cargo install ultralytics-inference --features cuda,tensorrt
[dependencies]
ultralytics-inference = { version = "0.0.18", features = ["video"] }

Link to this sectionOutput and saving#

By default, predictions are annotated and saved to an auto-incrementing run directory:

runs/
└── detect/
    └── predict/          # then predict2, predict3, ...
        └── image.jpg     # annotated result

The subfolder matches the task (runs/segment/, runs/pose/, and so on). For video sources the annotated output is written as a video file; pass --save-frames to write individual frames instead. For the semantic task, --save-json writes per-pixel class-map PNGs under a results/ subfolder. Annotated image and video saving require the annotate feature; semantic class-map PNG export does not. Video input and output require the video feature.

Link to this sectionFAQ#

Link to this sectionDo I need Python installed?#

No. The crate runs exported ONNX models directly through ONNX Runtime. Python is only needed if you train or export models with the Ultralytics package beforehand.

Link to this sectionWhich models can I run?#

Any Ultralytics YOLO model exported to ONNX, including YOLO26, YOLO11, and YOLOv8. Known model names download automatically; you can also point --model at any local .onnx file.

Link to this sectionHow do I get a model file?#

Export from the Python package, for example with the ONNX integration, or let the CLI download a standard nano model for the chosen task on first run.

Link to this sectionIs video supported?#

Yes, with the video feature enabled and FFmpeg 7+ installed on the system. This covers video files, webcams, and RTSP/RTMP/HTTP streams.

Link to this sectionWhat do the annotate and visualize features do?#

Both are enabled by default. annotate draws boxes, masks, keypoints, and class labels onto the image and is required for --save to write annotated results. visualize opens a live window for --show. For a smaller, headless build that only returns results programmatically, disable them with cargo build --no-default-features (add back individual features as needed).

Link to this sectionWhere is the full API reference?#

This page is a high-level overview. The complete, type-by-type API reference for every public struct, method, and configuration option is published on docs.rs, generated directly from the source.

Contributors

Comments