Hailo Export for Ultralytics YOLO Models
Hailo HEF is not officially supported as a direct Ultralytics model.export(format="hailo") target. The workflow below exports to ONNX first, then uses Hailo's external Dataflow Compiler toolchain to produce a .hef file. For better performance per watt than older Hailo HEF deployments, use newer direct Ultralytics export formats such as Axelera AI or DeepX instead.
The Hailo toolchain uses HEF files for embedded platforms including the Raspberry Pi AI Kit and AI HAT+, industrial cameras, edge gateways, and AI PCs.
This guide walks through exporting Ultralytics YOLO detection models to Hailo's HEF (Hailo Executable Format) using the Hailo Dataflow Compiler (DFC) SDK. The workflow starts from a YOLO .pt model, exports to ONNX, compiles with Hailo tools, and produces a .hef file ready for Hailo-8, Hailo-8L, and Hailo-15 accelerators.
When to Use Hailo HEF
HEF is the compiled artifact consumed by HailoRT on Hailo target devices. Use this guide only when your deployment hardware specifically requires Hailo HEF. If you are still choosing edge hardware or export targets, start with newer direct Ultralytics export formats such as Axelera AI or DeepX, which provide a supported model.export(...) workflow and better performance-per-watt options than older Hailo deployments.
HEF is similar in deployment role to hardware-specific formats such as RKNN for Rockchip NPUs, IMX500 for Raspberry Pi AI Cameras, and Qualcomm QNN for Snapdragon NPUs, but it is not currently generated directly by Ultralytics.
This workflow is relevant when you need:
- Raspberry Pi AI Kit compatibility: Hailo-8L is used in the official Raspberry Pi AI Kit and AI HAT+.
- HailoRT post-processing: HailoRT can include YOLO non-maximum suppression in the compiled inference pipeline.
- INT8 compilation: The Hailo DFC quantizes the model with representative calibration images to produce an INT8 graph for Hailo hardware. Learn more about model quantization.
Hailo HEF Export Format
HEF is a hardware-specific executable generated by the Hailo Dataflow Compiler. It contains the quantized model graph, memory allocation, scheduling, and optional post-processing configured for a target Hailo architecture. Unlike standard YOLO Export mode formats that are produced directly by model.export(format=...), HEF compilation currently uses a two-stage flow:
- Export YOLO to ONNX with Ultralytics.
- Use Hailo DFC tools to parse, optimize, quantize, and compile the ONNX model into HEF.
The full workflow expands into the following pipeline:
YOLO (.pt) -> ONNX -> HAR (parse) -> HAR (optimize/quantize) -> HEF (compile)- Export to ONNX using Ultralytics Export mode
- Parse the ONNX model into Hailo's intermediate HAR format
- Load a model script (
.alls) with normalization and post-processing directives - Calibrate and quantize using representative images
- Compile to a deployable HEF file
Supported Tasks
This guide focuses on Ultralytics YOLO object detection models, because the Hailo model script and NMS configuration are detection-head specific.
| Task | Supported |
|---|---|
| Object Detection | ✅ Yes |
| Instance Segmentation | ❌ No |
| Semantic Segmentation | ❌ No |
| Pose Estimation | ❌ No |
| OBB Detection | ❌ No |
| Classification | ❌ No |
For instance segmentation, semantic segmentation, pose, OBB, and classification deployments, compare other edge formats in the Export mode table or use a generic ONNX pipeline where your target runtime supports the task.
Compatibility Notes
Hailo export compatibility depends on the model head, input image size, class count, Hailo architecture, model script (.alls), and NMS configuration. Static files from the Hailo Model Zoo are useful references, but they are not universal templates. For example, an NMS JSON created for a COCO 80-class YOLO11n model is not correct for a custom 3-class model or for a different fixed imgsz.
| Scope | Expected Support | Notes |
|---|---|---|
| YOLOv8 / YOLO11 detection, stock models | ✅ Good | Shared decoupled detection head; .alls, end nodes, and NMS config still need to match the exported graph and fixed imgsz. |
| Custom YOLOv8 / YOLO11 detection | ✅ Possible | Requires per-model NMS configuration generated from class count, strides, and detection-head layout; static Model Zoo JSON will not match. |
| YOLOv9 detection | ⚠️ Validate | Similar detection-head pattern, but compile and output parsing should be tested before treating it as supported. |
| YOLOv10 / YOLO26 end-to-end detection | ❌ Not supported | End-to-end/NMS-free exports do not match the Hailo NMS post-processing path; use a traditional detection head if testing manually. |
| Dynamic or arbitrary image sizes | ❌ Not supported | Hailo compilation uses a fixed input shape; .alls and NMS settings must match the exported imgsz. |
Installation
Step 1: Install Ultralytics
pip install ultralyticsStep 2: Install Hailo DFC SDK
The Hailo DFC is required for parsing, optimization, and compilation. Download the Python wheel from the Hailo Developer Zone (free registration required) and install it:
pip install /path/to/hailo_sdk_client-*.whlThe Hailo DFC SDK requires a Linux x86_64 machine. Export and compilation cannot be performed on ARM devices such as Raspberry Pi. Copy the resulting .hef file to your Hailo-powered device for deployment with HailoRT.
YOLO11n HEF Export Example
The script below compiles a YOLO11n detection model from .pt to .hef at a fixed 640-pixel input size. It exports to ONNX using Ultralytics, then compiles with Hailo DFC using COCO128 as a small calibration dataset.
Before running the script, download the matching YOLO11n NMS config file from the Hailo Model Zoo or create your own Hailo NMS JSON for the model. Reuse this script as a known YOLO11n starting point; custom models need matching end nodes, .alls directives, and NMS settings.
import random
import numpy as np
from hailo_sdk_client import ClientRunner
from PIL import Image
from ultralytics import YOLO
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import DATASETS_DIR
# Configuration
MODEL = "yolo11n"
HW_ARCH = "hailo8" # hailo8 | hailo8l | hailo15h
IMGSZ = 640
CALIB_IMAGES = 128
NMS_CONFIG = "yolo11n_nms_config.json" # Download or generate for your exact model.
# YOLO11 detection head end nodes. See "Supported Models and End Nodes" for YOLOv8 and other families.
END_NODES = [
"/model.23/cv2.0/cv2.0.2/Conv",
"/model.23/cv3.0/cv3.0.2/Conv",
"/model.23/cv2.1/cv2.1.2/Conv",
"/model.23/cv3.1/cv3.1.2/Conv",
"/model.23/cv2.2/cv2.2.2/Conv",
"/model.23/cv3.2/cv3.2.2/Conv",
]
# Step 1: Export to ONNX
model = YOLO(f"{MODEL}.pt")
model.export(format="onnx", imgsz=IMGSZ, opset=11) # creates an ONNX file named after MODEL
# Step 2: Parse ONNX with Hailo DFC
# The DFC prints the detected end nodes after parsing; use them if unsure.
runner = ClientRunner(hw_arch=HW_ARCH)
runner.translate_onnx_model(f"{MODEL}.onnx", end_node_names=END_NODES)
# Step 3: Load model script (normalization + HailoRT NMS)
# The conv layer names are generated by DFC and can change for other model sizes/families.
alls = (
"normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])\n"
"change_output_activation(conv54, sigmoid)\n"
"change_output_activation(conv65, sigmoid)\n"
"change_output_activation(conv80, sigmoid)\n"
f'nms_postprocess("{NMS_CONFIG}", meta_arch=yolov8, engine=cpu)\n'
"allocator_param(width_splitter_defuse=disabled)"
)
runner.load_model_script(alls)
# Step 4: Build calibration dataset (auto-downloads COCO128)
check_det_dataset("coco128.yaml")
calib_dir = DATASETS_DIR / "coco128" / "images" / "train2017"
image_files = list(calib_dir.glob("*.jpg")) + list(calib_dir.glob("*.png"))
if not image_files:
raise FileNotFoundError(f"No calibration images found in {calib_dir}")
calibset = np.zeros((CALIB_IMAGES, IMGSZ, IMGSZ, 3), dtype=np.float32)
for i in range(CALIB_IMAGES):
img = Image.open(random.choice(image_files)).convert("RGB").resize((IMGSZ, IMGSZ))
calibset[i] = np.array(img, dtype=np.float32)
# Step 5: Optimize and quantize
runner.optimize(calibset)
runner.save_har(f"{MODEL}.o.har") # optional intermediate HAR
# Step 6: Compile to HEF
hef = runner.compile()
with open(f"{MODEL}.hef", "wb") as f:
f.write(hef)
print(f"Compiled HEF saved to: {MODEL}.hef")The resulting HEF file, such as yolo11n.hef, is ready to deploy on a compatible Hailo device. If you are compiling for Raspberry Pi AI Kit, set HW_ARCH = "hailo8l" before running the compile step.
Step-by-Step Breakdown
Step 1: Export to ONNX
Ultralytics exports your trained model to ONNX format, which the Hailo DFC ingests as input. Set opset=11 for broad DFC compatibility.
from ultralytics import YOLO
MODEL = "yolo11n"
model = YOLO(f"{MODEL}.pt")
model.export(format="onnx", imgsz=640, opset=11)Step 2: Parse the ONNX Model
The translate_onnx_model call converts the ONNX graph into Hailo's intermediate HAR representation. The end_node_names list tells the DFC where to cut the graph before NMS so Hailo can attach its own hardware post-processing.
from hailo_sdk_client import ClientRunner
runner = ClientRunner(hw_arch="hailo8")
runner.translate_onnx_model(f"{MODEL}.onnx", end_node_names=END_NODES)The DFC prints a suggestion after parsing:
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: ...
Copy those node names if you are unsure which ones to use, or if you are working with a custom or less common architecture.
Step 3: Load the Model Script
The model script (.alls) configures input normalization, output activation, and NMS post-processing. The meta_arch=yolov8 setting applies to both YOLOv8 and YOLO11 since they share the same detection head layout.
MODEL = "yolo11n"
NMS_CONFIG = "yolo11n_nms_config.json"
alls = (
"normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])\n"
"change_output_activation(conv54, sigmoid)\n"
"change_output_activation(conv65, sigmoid)\n"
"change_output_activation(conv80, sigmoid)\n"
f'nms_postprocess("{NMS_CONFIG}", meta_arch=yolov8, engine=cpu)\n'
"allocator_param(width_splitter_defuse=disabled)"
)
runner.load_model_script(alls)The change_output_activation layer names (conv54, conv65, conv80) are assigned by the DFC during parsing and are model-specific. If you are compiling a different model size or architecture, check the DFC output for the correct names, or use a predefined .alls file from the Hailo Model Zoo.
The NMS_CONFIG file is also model-specific. Use the config that matches your exported model, or start from the Hailo Model Zoo configuration for the closest YOLO variant.
engine=cpu runs NMS through HailoRT on the host CPU. Use engine=nn_core only for model/script combinations that Hailo documents as supported by the target hardware and SDK version.
Remove the nms_postprocess line if you prefer to run NMS fully in your application code. If you do this, update the inference parser because the HEF will output raw detection-head tensors instead of grouped NMS detections.
Step 4: Build the Calibration Dataset
INT8 quantization requires a representative set of images. The script below uses COCO128, which Ultralytics downloads automatically via check_det_dataset:
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import DATASETS_DIR
check_det_dataset("coco128.yaml") # downloads to DATASETS_DIR if not present
calib_dir = DATASETS_DIR / "coco128" / "images" / "train2017"Use at least 64 images for calibration. More images generally improve quantization quality. For best results, use images from your deployment domain rather than COCO128.
Step 5: Optimize and Quantize
runner.optimize(calibset)
runner.save_har(f"{MODEL}.o.har") # optional intermediate checkpointThis step applies quantization-aware fine-tuning and layer noise analysis. A GPU is strongly recommended; without one, this step can take several hours.
Step 6: Compile to HEF
hef = runner.compile()
with open(f"{MODEL}.hef", "wb") as f:
f.write(hef)Supported Models and End Nodes
For detection models, end_node_names identifies the ONNX detection-head outputs that Hailo should compile before attaching its NMS post-processing. These names vary by architecture and can change when the exported graph changes.
YOLO11 and YOLOv8
YOLO11 and YOLOv8 share the same decoupled detection head. The layer index differs by one between the two families:
| Model Family | Detection Head Layer | End Node Pattern |
|---|---|---|
| YOLO11 (all) | model.23 | /model.23/cv2.0/cv2.0.2/Conv (6 nodes) |
| YOLOv8 (all) | model.22 | /model.22/cv2.0/cv2.0.2/Conv (6 nodes) |
YOLO11 end nodes (all sizes: n, s, m, l, x):
END_NODES = [
"/model.23/cv2.0/cv2.0.2/Conv",
"/model.23/cv3.0/cv3.0.2/Conv",
"/model.23/cv2.1/cv2.1.2/Conv",
"/model.23/cv3.1/cv3.1.2/Conv",
"/model.23/cv2.2/cv2.2.2/Conv",
"/model.23/cv3.2/cv3.2.2/Conv",
]YOLOv8 end nodes (all sizes: n, s, m, l, x):
END_NODES = [
"/model.22/cv2.0/cv2.0.2/Conv",
"/model.22/cv3.0/cv3.0.2/Conv",
"/model.22/cv2.1/cv2.1.2/Conv",
"/model.22/cv3.1/cv3.1.2/Conv",
"/model.22/cv2.2/cv2.2.2/Conv",
"/model.22/cv3.2/cv3.2.2/Conv",
]Other Architectures
For other detection architectures, run the parse step without end_node_names first, read the suggested nodes from the DFC log output, then re-run with those nodes:
# First pass: let the DFC suggest end nodes
runner = ClientRunner(hw_arch=HW_ARCH)
runner.translate_onnx_model(f"{MODEL}.onnx")
# Check the printed log for: "[info] In order to use HailoRT post-processing..."Predefined .alls scripts and NMS config files for many YOLO variants are available in the Hailo Model Zoo.
Supported Hardware Architectures
| Architecture | Device | Peak Compute (Vendor Spec) | Common Use Case |
|---|---|---|---|
hailo8 | Hailo-8 | 26 TOPS | Hailo accelerator card |
hailo8l | Hailo-8L | 13 TOPS | Raspberry Pi AI Kit |
hailo15h | Hailo-15H | 20 TOPS | Hailo-15 target devices |
Set HW_ARCH in the script to match your target device before compiling.
Running Inference on Hailo Hardware
Once you have the .hef file, copy it to your Hailo-powered device and run inference using the HailoRT Python API (hailo_platform package). Unlike the DFC export steps, inference runs directly on the edge device.
The inference code below runs on the Hailo-powered device (e.g. Raspberry Pi + AI Kit), not on the x86 machine used for compilation.
Step 1: Install HailoRT on the Device
On the target device, install HailoRT and the Python bindings. For Raspberry Pi AI Kit and AI HAT+ users, the official Raspberry Pi AI software guide installs HailoRT, the device driver, and Python bindings with:
sudo apt install dkms
sudo apt install hailo-allFor non-Raspberry Pi Hailo devices, install the HailoRT package that matches your device, driver, and SDK version from the Hailo Developer Zone.
AI HAT+ 2 devices use a different Raspberry Pi package (hailo-h10-all) and Hailo-10H workflow. Follow the Raspberry Pi AI software guide for that hardware generation.
Step 2: Quick Sanity Check
Before running Python inference, confirm the Hailo device is recognized:
hailortcli fw-control identifyYou should see the device type, firmware version, and serial number printed.
Step 3: Run Inference
The script below runs object detection on a single image using the compiled HEF file and the hailo_platform Python API. It handles preprocessing, inference, and drawing bounding boxes from the HailoRT NMS output.
import numpy as np
from hailo_platform import (
HEF,
ConfigureParams,
FormatType,
HailoStreamInterface,
InferVStreams,
InputVStreamParams,
OutputVStreamParams,
VDevice,
)
from PIL import Image, ImageDraw
# Configuration
MODEL = "yolo11n"
HEF_PATH = f"{MODEL}.hef" # path to your compiled HEF file
SOURCE = "bus.jpg" # image path
IMGSZ = 640
CONF = 0.25
COCO_NAMES = [
"person",
"bicycle",
"car",
"motorcycle",
"airplane",
"bus",
"train",
"truck",
"boat",
"traffic light",
"fire hydrant",
"stop sign",
"parking meter",
"bench",
"bird",
"cat",
"dog",
"horse",
"sheep",
"cow",
"elephant",
"bear",
"zebra",
"giraffe",
"backpack",
"umbrella",
"handbag",
"tie",
"suitcase",
"frisbee",
"skis",
"snowboard",
"sports ball",
"kite",
"baseball bat",
"baseball glove",
"skateboard",
"surfboard",
"tennis racket",
"bottle",
"wine glass",
"cup",
"fork",
"knife",
"spoon",
"bowl",
"banana",
"apple",
"sandwich",
"orange",
"broccoli",
"carrot",
"hot dog",
"pizza",
"donut",
"cake",
"chair",
"couch",
"potted plant",
"bed",
"dining table",
"toilet",
"tv",
"laptop",
"mouse",
"remote",
"keyboard",
"cell phone",
"microwave",
"oven",
"toaster",
"sink",
"refrigerator",
"book",
"clock",
"vase",
"scissors",
"teddy bear",
"hair drier",
"toothbrush",
]
# Load HEF and connect to device
hef = HEF(HEF_PATH)
params = VDevice.create_params()
target = VDevice(params)
configure_params = ConfigureParams.create_from_hef(hef, interface=HailoStreamInterface.PCIe)
network_groups = target.configure(hef, configure_params)
network_group = network_groups[0]
network_group_params = network_group.create_params()
# Setup I/O virtual streams
input_vstreams_params = InputVStreamParams.make(network_group, quantized=False, format_type=FormatType.FLOAT32)
output_vstreams_params = OutputVStreamParams.make(network_group, quantized=False, format_type=FormatType.FLOAT32)
# Preprocess
orig = Image.open(SOURCE).convert("RGB")
ow, oh = orig.size
resized = orig.resize((IMGSZ, IMGSZ))
input_data = np.expand_dims(np.array(resized, dtype=np.float32), axis=0) # (1,640,640,3)
input_name = hef.get_input_vstream_infos()[0].name
# Inference
with InferVStreams(network_group, input_vstreams_params, output_vstreams_params) as pipeline:
with network_group.activate(network_group_params):
pipeline.send({input_name: input_data})
raw = pipeline.recv()
# Parse HailoRT NMS output and draw results
# When compiled with nms_postprocess the HEF outputs detections grouped by
# class: shape (batch, num_classes, max_dets, 5) where 5 = [y1,x1,y2,x2,score]
draw = ImageDraw.Draw(orig)
output_key = next(iter(raw.keys()))
batch_dets = raw[output_key][0] # shape: (num_classes, max_dets, 5)
for cls_idx, cls_dets in enumerate(batch_dets):
for det in cls_dets:
score = float(det[4])
if score < CONF:
continue
y1, x1, y2, x2 = det[:4]
# Scale from model coords (0-640) back to original image size
x1 = int(x1 * ow / IMGSZ)
y1 = int(y1 * oh / IMGSZ)
x2 = int(x2 * ow / IMGSZ)
y2 = int(y2 * oh / IMGSZ)
label = f"{COCO_NAMES[cls_idx]} {score:.2f}"
draw.rectangle([x1, y1, x2, y2], outline="red", width=2)
draw.text((x1 + 2, y1 + 2), label, fill="red")
orig.save("output.jpg")
print("Saved output.jpg")The detection output format assumes the HEF was compiled with nms_postprocess in the .alls script. If you compiled without NMS, the raw outputs are the 6 detection head tensors and you must run NMS in your application separately.
Raspberry Pi AI Kit and AI HAT+
The Raspberry Pi AI Kit and 13 TOPS AI HAT+ use Hailo-8L. To use either device:
- Set
HW_ARCH = "hailo8l"before compiling your HEF on the x86 machine. - Copy the
.hefto your Raspberry Pi. - Install HailoRT by following the official Raspberry Pi AI software guide.
- Run the inference script above.
For camera-based inference on Raspberry Pi, the picamera2 Hailo examples provide ready-to-use scripts for live detection with the Camera Module. You can also compare Raspberry Pi deployment paths in the Coral Edge TPU on Raspberry Pi guide and Sony IMX500 integration guide.
Video Inference with TAPPAS
For high-throughput video pipelines, TAPPAS provides GStreamer elements that stream video through the Hailo chip in real time:
MODEL=yolo11n
gst-launch-1.0 filesrc location=video.mp4 ! decodebin ! \
hailonet hef-path=${MODEL}.hef ! \
hailofilter function-name=yolov8 ! \
hailooverlay ! autovideosinkSee the TAPPAS documentation for full pipeline configuration options.
Summary
This guide covered the complete workflow to export Ultralytics YOLO detection models to Hailo HEF format:
- Export to ONNX with Ultralytics (
model.export(format="onnx")). - Parse the ONNX model with the Hailo DFC and specify detection head end nodes.
- Configure normalization and NMS via a model script.
- Quantize with a calibration dataset (COCO128 via Ultralytics).
- Compile to a
.heffile ready for Hailo-8, Hailo-8L, or Hailo-15.
For further details, see the Hailo Developer Zone, Hailo documentation, and the Hailo Model Zoo. For other Ultralytics export targets, see the related ONNX, OpenVINO, TensorRT, NCNN, TFLite Edge TPU, RKNN, Sony IMX500, and Qualcomm QNN guides. To compare exported model speed and accuracy across formats, use Benchmark mode. For the full list of formats and options, visit the Export mode documentation and the integrations guide page.
FAQ
What Hailo devices are supported?
The Hailo DFC supports Hailo-8 (hailo8), Hailo-8L (hailo8l), and Hailo-15H (hailo15h). See the Supported Hardware Architectures table for the matching HW_ARCH value.
Which Ultralytics models can be exported?
This guide focuses on detection models. See Supported Tasks for task-level scope, Compatibility Notes for model compatibility limits, and Supported Models and End Nodes for YOLO11 and YOLOv8 end-node examples.
Why does the model script use meta_arch=yolov8 for YOLO11?
YOLO11 uses the same decoupled detection head architecture as YOLOv8. The Hailo DFC uses meta_arch=yolov8 for NMS configuration for both model families.
Do I need a GPU for the optimization step?
A GPU is strongly recommended for the quantization-aware fine-tuning in runner.optimize(). Without one, the process still works but is significantly slower (several hours vs. about 10-20 minutes with a GPU).
How do I find the correct end nodes for my model?
Run runner.translate_onnx_model(...) without specifying end_node_names, then use the suggested detection-head nodes printed by the DFC. See Other Architectures for the example command.
Where can I get the Hailo DFC SDK and NMS config files?
The Hailo DFC SDK Python wheel is available from the Hailo Developer Zone, while predefined .alls scripts and NMS config files are available from the Hailo Model Zoo.