Link to this sectionHailo Export for Ultralytics YOLO Models#
Hailo HEF is not currently supported as a direct Ultralytics model.export(format="hailo") target. The workflow below is a manual fallback that exports to ONNX first, then uses Hailo's external Dataflow Compiler toolchain to produce a .hef file. A seamless Ultralytics workflow should expose Hailo through the same Python and CLI export API as other hardware formats.
The Hailo toolchain uses HEF files for embedded platforms including the Raspberry Pi AI Kit and AI HAT+, industrial cameras, edge gateways, and AI PCs.
This guide walks through exporting selected Ultralytics YOLO models to Hailo's HEF (Hailo Executable Format) using the Hailo Dataflow Compiler (DFC) SDK. The workflow starts from a YOLO .pt model, exports to ONNX, compiles with Hailo tools, and produces a .hef file ready for supported Hailo accelerators.
Link to this sectionWhen to Use Hailo HEF#
HEF is the compiled artifact consumed by HailoRT on Hailo target devices. Use this manual guide only when your deployment hardware specifically requires Hailo HEF before direct Ultralytics Hailo export support is available.
HEF is similar in deployment role to hardware-specific formats such as RKNN for Rockchip NPUs, IMX500 for Raspberry Pi AI Cameras, and Qualcomm QNN for Snapdragon NPUs, but it is not currently generated directly by Ultralytics.
This workflow is relevant when you need:
- Raspberry Pi AI Kit compatibility: Hailo-8L is used in the official Raspberry Pi AI Kit and AI HAT+.
- HailoRT post-processing: HailoRT can include YOLO non-maximum suppression in the compiled inference pipeline.
- INT8 compilation: The Hailo DFC quantizes the model with representative calibration images to produce an INT8 graph for Hailo hardware. Learn more about model quantization.
Link to this sectionHailo HEF Export Format#
HEF is a hardware-specific executable generated by the Hailo Dataflow Compiler. It contains the quantized model graph, memory allocation, scheduling, and optional post-processing configured for a target Hailo architecture. Unlike standard YOLO Export mode formats that are produced directly by model.export(format=...), HEF compilation currently uses a two-stage flow:
- Export YOLO to ONNX with Ultralytics.
- Use Hailo DFC tools to parse, optimize, quantize, and compile the ONNX model into HEF.
The full workflow expands into the following pipeline:
YOLO (.pt) -> ONNX -> HAR (parse) -> HAR (optimize/quantize) -> HEF (compile)- Export to ONNX using Ultralytics Export mode
- Parse the ONNX model into Hailo's intermediate HAR format
- Load a model script (
.alls) with normalization and post-processing directives - Calibrate and quantize using representative images
- Compile to a deployable HEF file
Link to this sectionSupported Tasks#
The current manual example focuses on YOLO11 object detection because the Hailo model script and post-processing configuration are detection-head specific. A future direct model.export(format="hailo") implementation should make Hailo export feel like every other Ultralytics export format, with task support gated by the model head and Hailo compiler compatibility rather than by external workflow steps.
| Task | Direct Hailo Export Target | Notes |
|---|---|---|
| Object Detection | ✅ Primary target | YOLOv8, YOLO11, and YOLO26 detection should be the first direct-export path. |
| Instance Segmentation | ✅ Target | YOLOv8, YOLO11, and YOLO26 segmentation require task-specific mask output handling and validation. |
| Semantic Segmentation | ⚠️ Validate | YOLO26 semantic segmentation needs a dedicated compiler and output validation path. |
| Pose Estimation | ⚠️ Validate | Pose requires keypoint output handling beyond the detection NMS path. |
| OBB Detection | ⚠️ Validate | OBB requires rotated-box output handling beyond the standard detection NMS path. |
| Classification | ⚠️ Validate | Classification has a simpler output head, but still needs Hailo compile and runtime validation. |
Until direct Hailo export is implemented in Ultralytics, only the manual ONNX-to-HEF workflow below is documented.
Link to this sectionHailo SDK Versions#
Direct Hailo export must account for Hailo's hardware and SDK generation split:
- Hailo-8 and Hailo-8L: use Hailo Dataflow Compiler v3.x. This is the relevant path for Raspberry Pi AI Kit and 13 TOPS AI HAT+ deployments.
- Hailo-10 and Hailo-15: use Hailo Dataflow Compiler v5.x.
This version split affects compiler APIs, supported architectures, generated HEF compatibility, and which hw_arch values a direct exporter should expose. Task support on one Hailo hardware generation should not be treated as support on another without validating the target DFC version and hw_arch.
Link to this sectionCompatibility Notes#
Hailo export compatibility depends on the model head, input image size, class count, Hailo architecture, generated model script (.alls), and post-processing configuration. Static configs are not universal templates. For example, an NMS JSON created for a COCO 80-class YOLO11n model is not correct for a custom 3-class model or for a different fixed imgsz.
| Scope | Expected Support | Notes |
|---|---|---|
| YOLOv8 / YOLO11 detection | ✅ Good | Shared decoupled detection head; .alls directives, end nodes, and NMS config still need to match the exported graph and fixed imgsz. |
| Custom YOLOv8 / YOLO11 detection | ✅ Possible | Requires per-model NMS configuration generated from class count, strides, and detection-head layout; static JSON will not match. |
| YOLO26 detection | ✅ Target | NMS-free architecture needs a separate compiler/post-processing path; do not reuse the YOLO11/YOLOv8 NMS workflow below for YOLO26. |
| YOLO26 instance segmentation | ✅ Target | Needs YOLO26 segmentation-specific mask output handling and accuracy validation. |
| YOLO26 semantic, pose, OBB, classification | ⚠️ Research | These tasks need dedicated compiler and runtime validation before they can be advertised as directly supported. |
| Dynamic or arbitrary image sizes | ❌ Not supported | Hailo compilation uses a fixed input shape; .alls and post-processing settings must match the exported imgsz. |
Link to this sectionInstallation#
Link to this sectionStep 1: Install Ultralytics#
pip install ultralyticsLink to this sectionStep 2: Install Hailo DFC SDK#
The Hailo DFC is required for parsing, optimization, and compilation. Download the Python wheel from the Hailo Developer Zone (free registration required) and install it:
pip install /path/to/hailo_dataflow_compiler-*.whlThe Hailo DFC SDK requires a Linux x86_64 machine. Export and compilation cannot be performed on ARM devices such as Raspberry Pi. Copy the resulting .hef file to your Hailo-powered device for deployment with HailoRT.
Link to this sectionYOLO11n HEF Export Example#
The script below compiles a YOLO11n detection model from .pt to .hef at a fixed 640-pixel input size. It exports to ONNX using Ultralytics, then compiles with Hailo DFC using COCO128 as a small calibration dataset.
Before running the script, provide a Hailo NMS JSON that matches the exact YOLO11n export graph, class count, strides, and fixed input size. Reuse this script as a known YOLO11n starting point; custom models need matching end nodes, .alls directives, and NMS settings.
YOLO26 models are NMS-free. A direct Ultralytics Hailo exporter needs a dedicated YOLO26 compile and post-processing path for detection or instance segmentation instead of the YOLO11 NMS example below.
import ast
import random
from pathlib import Path
import numpy as np
import onnx
from hailo_sdk_client import ClientRunner
from PIL import Image
from ultralytics import YOLO
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import DATASETS_DIR, YAML
# Configuration
MODEL = "yolo11n"
HW_ARCH = "hailo8" # hailo8 | hailo8l | hailo15h
IMGSZ = 640
CALIB_IMAGES = 128
NMS_CONFIG = "yolo11n_nms_config.json" # Download or generate for your exact model.
OUT_DIR = Path(f"{MODEL}_hailo_model") # deploy folder (mirrors Ultralytics <model>_<format>_model exports)
OUT_DIR.mkdir(exist_ok=True)
# YOLO11 detection head end nodes. See "Supported Models and End Nodes" for YOLOv8 and other families.
END_NODES = [
"/model.23/cv2.0/cv2.0.2/Conv",
"/model.23/cv3.0/cv3.0.2/Conv",
"/model.23/cv2.1/cv2.1.2/Conv",
"/model.23/cv3.1/cv3.1.2/Conv",
"/model.23/cv2.2/cv2.2.2/Conv",
"/model.23/cv3.2/cv3.2.2/Conv",
]
# Step 1: Export to ONNX, then move it into the deploy folder to keep the working directory tidy
model = YOLO(f"{MODEL}.pt")
onnx_path = Path(model.export(format="onnx", imgsz=IMGSZ, opset=11))
onnx_path = onnx_path.rename(OUT_DIR / onnx_path.name)
# Copy the metadata Ultralytics embedded in the ONNX into the standard metadata.yaml sidecar.
# The HEF stores no class names, so inference reads them from this file.
meta = {p.key: p.value for p in onnx.load(onnx_path, load_external_data=False).metadata_props}
for k in ("stride", "batch", "channels"):
if k in meta:
meta[k] = int(meta[k])
for k in ("imgsz", "names", "args", "end2end"):
if k in meta:
meta[k] = ast.literal_eval(meta[k])
YAML.save(OUT_DIR / "metadata.yaml", meta)
# Step 2: Parse ONNX with Hailo DFC
# The DFC prints the detected end nodes after parsing; use them if unsure.
runner = ClientRunner(hw_arch=HW_ARCH)
runner.translate_onnx_model(str(onnx_path), end_node_names=END_NODES)
# Step 3: Load model script (normalization + HailoRT NMS)
# The conv layer names are generated by DFC and can change for other model sizes/families.
model_script = (
"normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])\n"
"change_output_activation(conv54, sigmoid)\n"
"change_output_activation(conv65, sigmoid)\n"
"change_output_activation(conv80, sigmoid)\n"
f'nms_postprocess("{NMS_CONFIG}", meta_arch=yolov8, engine=cpu)\n'
"allocator_param(width_splitter_defuse=disabled)"
)
runner.load_model_script(model_script)
# Step 4: Build calibration dataset (auto-downloads COCO128)
check_det_dataset("coco128.yaml")
calib_dir = DATASETS_DIR / "coco128" / "images" / "train2017"
image_files = list(calib_dir.glob("*.jpg")) + list(calib_dir.glob("*.png"))
if not image_files:
raise FileNotFoundError(f"No calibration images found in {calib_dir}")
calibset = np.zeros((CALIB_IMAGES, IMGSZ, IMGSZ, 3), dtype=np.float32)
for i in range(CALIB_IMAGES):
img = Image.open(random.choice(image_files)).convert("RGB").resize((IMGSZ, IMGSZ))
calibset[i] = np.array(img, dtype=np.float32)
# Step 5: Optimize and quantize
runner.optimize(calibset)
runner.save_har(str(OUT_DIR / f"{MODEL}.o.har")) # optional intermediate HAR
# Step 6: Compile to HEF
hef = runner.compile()
hef_path = OUT_DIR / f"{MODEL}.hef"
with open(hef_path, "wb") as f:
f.write(hef)
# Note: the Hailo SDK writes *.log files (acceleras.log, allocator.log, hailo_sdk.client.log,
# hailo_sdk.core.log) to the working directory. They are diagnostic scratch, safe to ignore or delete.
print(f"Compiled HEF saved to: {hef_path}")The export script organizes artifacts and logs as follows:
- Deployment Folder: Artifacts are saved to
yolo11n_hailo_model/, mirroring the standard<model>_<format>_model/layout used by other Ultralytics exports. - Required Files: The two files needed for deployment are the compiled
yolo11n.hefand themetadata.yamlsidecar. - Metadata: The
metadata.yamlcontains essential fields (names,imgsz,task,stride, etc.) extracted from the ONNX metadata. Inference scripts load class names from this file since the HEF format does not store them. - Intermediate Files: The export folder also contains the intermediate
yolo11n.onnxandyolo11n.o.harcheckpoints. - Log Files: The Hailo SDK generates several diagnostic logs (e.g.,
acceleras.log,allocator.log,hailo_sdk.client.log, andhailo_sdk.core.log) in the working directory; these can be safely ignored or deleted. - Raspberry Pi AI Kit: For this specific hardware, ensure you set
HW_ARCH = "hailo8l"before running the compilation step.
Link to this sectionStep-by-Step Breakdown#
The full script above runs end to end. This section explains what each stage does and the model-specific details to watch when adapting it to your own model.
Link to this sectionStep 1: Export to ONNX and Save Metadata#
Ultralytics exports your trained model to ONNX format, which the Hailo DFC ingests as input. opset=11 gives broad DFC compatibility, and the ONNX is moved into a yolo11n_hailo_model/ deploy folder (mirroring the <model>_<format>_model/ layout of other Ultralytics exports) to keep the working directory tidy.
The HEF stores no class names, so the metadata Ultralytics embeds in the ONNX is copied into a standard metadata.yaml sidecar next to it. This is the same metadata.yaml other export formats produce (names, imgsz, task, stride, and more), and inference reads the class names from it, so the workflow works for custom models without hardcoding any labels.
Link to this sectionStep 2: Parse the ONNX Model#
runner.translate_onnx_model(...) converts the ONNX graph into Hailo's intermediate HAR representation. The end_node_names list tells the DFC where to cut the graph before NMS so Hailo can attach its own hardware post-processing.
The DFC prints a suggestion after parsing:
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: ...
Copy those node names if you are unsure which ones to use, or if you are working with a custom or less common architecture.
Link to this sectionStep 3: Load the Model Script#
The model script (.alls) configures input normalization, output activation, and NMS post-processing. The meta_arch=yolov8 setting applies to both YOLOv8 and YOLO11 since they share the same detection head layout.
MODEL = "yolo11n"
NMS_CONFIG = "yolo11n_nms_config.json"
model_script = (
"normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])\n"
"change_output_activation(conv54, sigmoid)\n"
"change_output_activation(conv65, sigmoid)\n"
"change_output_activation(conv80, sigmoid)\n"
f'nms_postprocess("{NMS_CONFIG}", meta_arch=yolov8, engine=cpu)\n'
"allocator_param(width_splitter_defuse=disabled)"
)
runner.load_model_script(model_script)The change_output_activation layer names (conv54, conv65, conv80) are assigned by the DFC during parsing and are model-specific. If you are compiling a different model size or architecture, check the DFC output for the correct names or generate the .alls directives from the exported graph.
The NMS_CONFIG file is also model-specific. Use a config that matches your exported model exactly.
engine=cpu runs NMS through HailoRT on the host CPU. Use engine=nn_core only for model/script combinations that Hailo documents as supported by the target hardware and SDK version.
Remove the nms_postprocess line if you prefer to run NMS fully in your application code. If you do this, update the inference parser because the HEF will output raw detection-head tensors instead of grouped NMS detections.
Link to this sectionStep 4: Build the Calibration Dataset#
INT8 quantization requires a representative set of images stacked into a (N, imgsz, imgsz, 3) float32 array. The script uses COCO128, which Ultralytics downloads automatically via check_det_dataset.
Use at least 64 images for calibration. More images generally improve quantization quality. For best results, use images from your deployment domain rather than COCO128.
Link to this sectionStep 5: Optimize and Quantize#
runner.optimize(calibset) applies quantization-aware fine-tuning and layer noise analysis, then runner.save_har(...) writes an optional intermediate checkpoint. A GPU is strongly recommended; without one, this step can take several hours.
Link to this sectionStep 6: Compile to HEF#
runner.compile() produces the final HEF, written to yolo11n_hailo_model/yolo11n.hef. It now sits alongside its metadata.yaml, ready to copy to the device for inference.
Link to this sectionSupported Models and End Nodes#
For detection models, end_node_names identifies the ONNX detection-head outputs that Hailo should compile before attaching its NMS post-processing. These names vary by architecture and can change when the exported graph changes.
The end-node examples below apply to YOLOv8 and YOLO11 detection models that use Hailo's YOLOv8-style NMS post-processing. YOLO26 is NMS-free and does not use this YOLO11 NMS configuration.
Link to this sectionYOLO11 and YOLOv8#
YOLO11 and YOLOv8 share the same decoupled detection head. The layer index differs by one between the two families:
| Model Family | Detection Head Layer | End Node Pattern |
|---|---|---|
| YOLO11 (all) | model.23 | /model.23/cv2.0/cv2.0.2/Conv (6 nodes) |
| YOLOv8 (all) | model.22 | /model.22/cv2.0/cv2.0.2/Conv (6 nodes) |
YOLO11 end nodes (all sizes: n, s, m, l, x):
END_NODES = [
"/model.23/cv2.0/cv2.0.2/Conv",
"/model.23/cv3.0/cv3.0.2/Conv",
"/model.23/cv2.1/cv2.1.2/Conv",
"/model.23/cv3.1/cv3.1.2/Conv",
"/model.23/cv2.2/cv2.2.2/Conv",
"/model.23/cv3.2/cv3.2.2/Conv",
]YOLOv8 end nodes (all sizes: n, s, m, l, x):
END_NODES = [
"/model.22/cv2.0/cv2.0.2/Conv",
"/model.22/cv3.0/cv3.0.2/Conv",
"/model.22/cv2.1/cv2.1.2/Conv",
"/model.22/cv3.1/cv3.1.2/Conv",
"/model.22/cv2.2/cv2.2.2/Conv",
"/model.22/cv3.2/cv3.2.2/Conv",
]Link to this sectionOther Architectures#
For other detection architectures, run the parse step without end_node_names first, read the suggested nodes from the DFC log output, then re-run with those nodes:
# First pass: let the DFC suggest end nodes
runner = ClientRunner(hw_arch=HW_ARCH)
runner.translate_onnx_model(f"{MODEL}.onnx")
# Check the printed log for: "[info] In order to use HailoRT post-processing..."For direct Ultralytics support, these .alls directives and post-processing settings should be generated or selected by the exporter instead of requiring users to assemble them manually.
Link to this sectionSupported Hardware Architectures#
| Architecture | Device | Peak Compute (Vendor Spec) | Common Use Case |
|---|---|---|---|
hailo8 | Hailo-8 | 26 TOPS | Hailo accelerator card |
hailo8l | Hailo-8L | 13 TOPS | Raspberry Pi AI Kit |
hailo15h | Hailo-15H | 20 TOPS | Hailo-15 target devices |
Set HW_ARCH in the script to match your target device before compiling.
Link to this sectionRunning Inference on Hailo Hardware#
Once compilation finishes, copy the whole yolo11n_hailo_model/ folder (the .hef plus its metadata.yaml) to your Hailo-powered device and run inference using either the HailoRT Python API (hailo_platform package) or, on Raspberry Pi, the picamera2 Hailo helper (a HailoRT wrapper). Both are shown in the tabs below. Keeping the two files together lets the scripts below read the class names from metadata.yaml next to the HEF. Unlike the DFC export steps, inference runs directly on the edge device.
The inference code below runs on the Hailo-powered device (e.g. Raspberry Pi + AI Kit), not on the x86 machine used for compilation.
Link to this sectionStep 1: Install HailoRT on the Device#
On the target device, install HailoRT and the Python bindings. For Raspberry Pi AI Kit and AI HAT+ users, the official Raspberry Pi AI software guide installs HailoRT, the device driver, and Python bindings with:
sudo apt install dkms
sudo apt install hailo-all
sudo rebootFor non-Raspberry Pi Hailo devices, install the HailoRT package that matches your device, driver, and SDK version from the Hailo Developer Zone.
AI HAT+ 2 devices use a different Raspberry Pi package (hailo-h10-all) and Hailo-10H workflow. Follow the Raspberry Pi AI software guide for that hardware generation.
Link to this sectionStep 2: Quick Sanity Check#
Before running Python inference, confirm the Hailo device is recognized:
hailortcli fw-control identifyYou should see the device type, firmware version, and serial number printed.
Executing on device: 0001:01:00.0
Identifying board
Control Protocol Version: 2
Firmware Version: 4.23.0 (release,app,extended context switch buffer)
Logger Version: 0
Board Name: Hailo-8
Device Architecture: HAILO8Link to this sectionStep 3: Run Inference#
The scripts below run object detection with the compiled HEF file. Both tabs accept the same --source inputs (an image, a video, a USB webcam index, or csi for the Raspberry Pi Camera Module) and differ only in the inference API: the Hailo SDK tab uses the low-level hailo_platform API (portable, minimal dependencies), while the picamera2 tab uses the Raspberry Pi picamera2 Hailo helper. Images and videos are written to an annotated file; webcam and CSI streams display in a live window.
The vendor-native HailoRT path runs on any platform with a Hailo device and needs no extra dependencies. Pass --source an image path, a video path, a webcam index (e.g. 0) for live USB/V4L2 capture, or csi for the Raspberry Pi Camera Module. The CSI option requires picamera2 to be installed, since modern Raspberry Pi OS routes the camera through libcamera rather than a plain V4L2 device.
import argparse
from pathlib import Path
import cv2
import numpy as np
import yaml
from hailo_platform import (
HEF,
ConfigureParams,
FormatType,
HailoStreamInterface,
InferVStreams,
InputVStreamParams,
OutputVStreamParams,
VDevice,
)
from tqdm import tqdm
IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".bmp", ".webp", ".tif", ".tiff"}
def parse_and_draw(per_class, frame, conf, names):
"""Draw HailoRT NMS detections (grouped by class, normalized [0, 1] coords) onto a BGR frame."""
h, w = frame.shape[:2]
for cls_idx, cls_dets in enumerate(per_class):
for det in cls_dets:
score = float(det[4])
if score < conf:
continue
# HailoRT NMS returns normalized [0, 1] coords as (y1, x1, y2, x2)
y1, x1, y2, x2 = det[:4]
x1, y1, x2, y2 = int(x1 * w), int(y1 * h), int(x2 * w), int(y2 * h)
label = f"{names[cls_idx]} {score:.2f}"
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, label, (x1 + 2, y1 + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1, cv2.LINE_AA)
def preprocess(frame, imgsz):
"""BGR frame -> (1, imgsz, imgsz, 3) float32 in 0-255 (HEF normalizes internally)."""
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
resized = cv2.resize(rgb, (imgsz, imgsz))
return np.expand_dims(resized.astype(np.float32), axis=0)
def csi_frames(width=1280, height=720):
"""Yield BGR frames from the Pi CSI Camera Module via picamera2."""
from picamera2 import Picamera2
picam2 = Picamera2()
# picamera2 "RGB888" is BGR-ordered in memory, so it drops straight into OpenCV
picam2.configure(picam2.create_preview_configuration(main={"size": (width, height), "format": "RGB888"}))
picam2.start()
try:
while True:
yield picam2.capture_array("main") # BGR
finally:
picam2.stop()
picam2.close()
def cv2_frames(src):
"""Yield BGR frames from a video file or USB/V4L2 webcam via OpenCV."""
cap = cv2.VideoCapture(src)
if not cap.isOpened():
raise RuntimeError(f"Could not open source {src}")
total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) # 0 for live webcams
pbar = tqdm(total=total, desc="Processing video", unit="frame") if total > 0 else None
try:
while True:
ok, frame = cap.read() # BGR
if not ok:
break
yield frame
if pbar is not None:
pbar.update(1)
finally:
if pbar is not None:
pbar.close()
cap.release()
def open_source(source):
"""Yield (frame, kind) pairs where kind is 'image', 'video', or 'stream'."""
if source == "csi":
yield from ((f, "stream") for f in csi_frames())
elif source.isdigit():
yield from ((f, "stream") for f in cv2_frames(int(source)))
elif Path(source).suffix.lower() in IMAGE_EXTS:
frame = cv2.imread(source)
if frame is None:
raise FileNotFoundError(f"Could not read image {source}")
yield frame, "image"
else:
yield from ((f, "video") for f in cv2_frames(source))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Hailo YOLO inference (image, video, webcam, or CSI camera)")
parser.add_argument("-m", "--model", default="yolo11n_hailo_model/yolo11n.hef", help="Path to the HEF model.")
parser.add_argument("--source", default="0", help="Image/video path, webcam index (e.g. 0), or 'csi'.")
parser.add_argument("--imgsz", type=int, default=640)
parser.add_argument("--conf", type=float, default=0.25)
args = parser.parse_args()
# Load class names from metadata.yaml saved next to the HEF during compilation (keyed by class index)
with open(Path(args.model).parent / "metadata.yaml") as f:
names = yaml.safe_load(f)["names"]
# Configure the device and network group ONCE
hef = HEF(args.model)
target = VDevice(VDevice.create_params())
configure_params = ConfigureParams.create_from_hef(hef, interface=HailoStreamInterface.PCIe)
network_group = target.configure(hef, configure_params)[0]
network_group_params = network_group.create_params()
input_vstreams_params = InputVStreamParams.make(network_group, quantized=False, format_type=FormatType.FLOAT32)
output_vstreams_params = OutputVStreamParams.make(network_group, quantized=False, format_type=FormatType.FLOAT32)
input_name = hef.get_input_vstream_infos()[0].name
writer = None # lazily created for video output
# Keep the pipeline and activation OPEN across frames (re-opening per frame is slow)
with InferVStreams(network_group, input_vstreams_params, output_vstreams_params) as pipeline:
with network_group.activate(network_group_params):
try:
for frame, kind in open_source(args.source):
raw = pipeline.infer({input_name: preprocess(frame, args.imgsz)})
parse_and_draw(raw[next(iter(raw.keys()))][0], frame, args.conf, names)
if kind == "image":
cv2.imwrite("output.jpg", frame)
print("Saved output.jpg")
elif kind == "video":
if writer is None:
h, w = frame.shape[:2]
writer = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), 30, (w, h))
writer.write(frame)
else: # live stream
cv2.imshow("Hailo YOLO", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
finally:
if writer is not None:
writer.release()
print("Saved output.mp4")
cv2.destroyAllWindows()Run it against any source (images save output.jpg, videos save output.mp4, live streams display in a window, press q to quit):
python hailo_infer.py --source bus.jpg # single image
python hailo_infer.py --source clip.mp4 # video file
python hailo_infer.py --source 0 # USB webcam, live
python hailo_infer.py --source csi # Raspberry Pi Camera ModuleThe detection output format assumes the HEF was compiled with nms_postprocess in the .alls script. If you compiled without NMS, the raw outputs are the 6 detection head tensors and you must run NMS in your application separately.
Link to this sectionVideo Inference with TAPPAS#
For high-throughput video pipelines, TAPPAS provides GStreamer elements that stream video through the Hailo chip in real time:
MODEL=yolo11n
gst-launch-1.0 filesrc location=video.mp4 ! decodebin ! \
hailonet hef-path=${MODEL}.hef ! \
hailofilter function-name=yolov8 ! \
hailooverlay ! autovideosinkSee the TAPPAS documentation for full pipeline configuration options.
Link to this sectionSummary#
This guide covered the complete workflow to export Ultralytics YOLO detection models to Hailo HEF format:
- Export to ONNX with Ultralytics (
model.export(format="onnx")). - Parse the ONNX model with the Hailo DFC and specify detection head end nodes.
- Configure normalization and NMS via a model script.
- Quantize with a calibration dataset (COCO128 via Ultralytics).
- Compile to a
.heffile ready for Hailo-8, Hailo-8L, or Hailo-15.
For further details, see the Hailo Developer Zone and Hailo documentation. For other Ultralytics export targets, see the related ONNX, OpenVINO, TensorRT, NCNN, TFLite Edge TPU, RKNN, Sony IMX500, and Qualcomm QNN guides. To compare exported model speed and accuracy across formats, use Benchmark mode. For the full list of formats and options, visit the Export mode documentation and the integrations guide page.
Link to this sectionFAQ#
Link to this sectionWhat Hailo devices are supported?#
The Hailo DFC supports Hailo-8 (hailo8), Hailo-8L (hailo8l), and Hailo-15H (hailo15h). See the Supported Hardware Architectures table for the matching HW_ARCH value.
Link to this sectionWhich Ultralytics models can be exported?#
This guide focuses on detection models. See Supported Tasks for task-level scope, Compatibility Notes for model compatibility limits, and Supported Models and End Nodes for YOLO11 and YOLOv8 end-node examples.
Link to this sectionWhy does the model script use meta_arch=yolov8 for YOLO11?#
YOLO11 uses the same decoupled detection head architecture as YOLOv8. The Hailo DFC uses meta_arch=yolov8 for NMS configuration for both model families.
Link to this sectionDo I need a GPU for the optimization step?#
A GPU is strongly recommended for the quantization-aware fine-tuning in runner.optimize(). Without one, the process still works but is significantly slower (several hours vs. about 10-20 minutes with a GPU).
Link to this sectionHow do I find the correct end nodes for my model?#
Run runner.translate_onnx_model(...) without specifying end_node_names, then use the suggested detection-head nodes printed by the DFC. See Other Architectures for the example command.
Link to this sectionWhere can I get the Hailo DFC SDK?#
The Hailo DFC SDK Python wheel is available from the Hailo Developer Zone. For a direct Ultralytics Hailo exporter, the model script and post-processing configuration should be generated or selected inside the export workflow.