Meet YOLO26: next-gen vision AI.

Link to this sectionCoreML Export for YOLO26 Models#

Apple ships dedicated AI silicon — the Neural Engine — in every modern iPhone, iPad, and Mac, and CoreML is the only way to program it. Exporting Ultralytics YOLO26 models to CoreML turns a trained .pt checkpoint into a native .mlpackage that runs all six YOLO tasks on-device at single-digit milliseconds, with no network connection and no data leaving the device.

Run YOLO on the Apple Neural Engine today with the official mobile apps

The official Ultralytics YOLO iOS SDK and Flutter plugin run CoreML exports on the Apple Neural Engine out of the box — real-time camera inference, single-image prediction, and automatic model download for all six YOLO26 tasks. For Android NPU deployment, see the Qualcomm QNN integration.



Watch: How to Export Ultralytics YOLO26 to CoreML for 2x Fast Inference on Apple Devices 🚀

Link to this sectionWhat is CoreML?#

Apple CoreML deployment pipeline

CoreML (styled "Core ML" by Apple) is Apple's on-device machine learning framework. It loads models in the modern ML Program format — the .mlpackage bundle the Ultralytics exporter produces — and schedules them across the device's CPU, GPU, and Apple Neural Engine (ANE), the dedicated NPU in every Apple-silicon chip. Because everything runs locally, inference works offline, adds no network latency, and keeps user data on the device.

CoreML integrates directly with Apple's Vision framework, which handles image scaling and orientation on the way into the model — this is how the Ultralytics iOS SDK feeds camera frames to YOLO with effectively zero preprocessing cost.

Link to this sectionWhy Export YOLO26 to CoreML?#

  • Neural Engine speed: YOLO26n detection runs end-to-end in 3.8 ms on an iPhone 17 Pro for single images, and ~16 ms/frame in sustained real-time camera use (see the table and notes below) — comfortably real-time with headroom for the rest of your app.
  • NMS-free by design: YOLO26 is end-to-end, so the exported graph needs no NMS pipeline and decode is sub-millisecond. Older models like YOLO11 can embed a CoreML NMS pipeline with nms=True.
  • Private and offline: All computation stays on the device — no cloud round-trips, no API keys, full data privacy.
  • One export, the whole ecosystem: The same .mlpackage runs on iOS, iPadOS, macOS, watchOS, tvOS, and visionOS, and powers the official Ultralytics iOS SDK and Flutter plugin.

Link to this sectionMeasured Performance#

End-to-end single-image inference for the official YOLO26n INT8 CoreML models on an iPhone 17 Pro (Apple A19, iOS 26.5). Each cell shows the total time (preprocessing + inference + postprocessing, excluding annotation) with the per-stage split beneath it. On iOS, Vision performs input scaling inside the inference request, so preprocessing is reported as 0 and its cost is included in inference.

ModelTasksize
(pixels)
CPU
.cpuOnly
(ms)
Neural Engine
.cpuAndNeuralEngine
(ms)
YOLO26nDetect6409.2
0.0 / 9.1 / 0.0
3.8
0.0 / 3.7 / 0.0
YOLO26n-segSegment64021.7
0.0 / 12.0 / 9.8
14.1
0.0 / 4.5 / 9.6
YOLO26n-semSemantic1024117.2
0.0 / 15.3 / 1.9
7.5
0.0 / 5.5 / 1.9
YOLO26n-clsClassify2242.4
0.0 / 2.4 / 0.0
2.0
0.0 / 1.9 / 0.0
YOLO26n-posePose64012.1
0.0 / 12.0 / 0.1
3.9
0.0 / 3.9 / 0.1
YOLO26n-obbOBB102422.3
0.0 / 22.3 / 0.0
7.2
0.0 / 7.2 / 0.0
  • 1 Semantic CoreML exports from this release embed the ArgMax in the graph and return a compact class map instead of float logits, cutting Neural Engine end-to-end time from 10.3 ms to 7.5 ms.
  • Speed values are single-image burst latencies — the mean of 15 runs after 3 warmup runs on bus.jpg, measured through the iOS SDK's per-stage timing via the Flutter plugin's benchmark harness. Sustained real-time camera operation runs higher (full-sensor letterboxing every frame plus thermal settling): YOLO26n detect measures ~16 ms/frame in the live camera app on the same device — see the iOS SDK performance doc for steady-state profiling.
  • The matching Snapdragon CPU/GPU/NPU table is in the Qualcomm QNN integration.

Link to this sectionExporting YOLO26 Models to CoreML#

Link to this sectionInstallation#

To install the required package, run:

Installation
# Install the required package for YOLO26
pip install ultralytics

The coremltools converter is installed automatically on first export. Export runs on macOS or x86 Linux; for detailed instructions and best practices, check our installation guide and the Common Issues guide.

Link to this sectionUsage#

The CoreML format supports the Export, Predict, and Validate modes. Inference and validation with CoreML run on macOS only. Export your model, then load the exported model to run inference or validate its accuracy.

Export
from ultralytics import YOLO

# Load a YOLO26 model
model = YOLO("yolo26n.pt")

# Export to CoreML (FP16 by default); int8=True matches the official app models
model.export(format="coreml", int8=True)  # creates 'yolo26n.mlpackage'
Predict
from ultralytics import YOLO

# Load the exported CoreML model (macOS)
model = YOLO("yolo26n.mlpackage")

# Run inference
results = model("https://ultralytics.com/images/bus.jpg")
Validate
from ultralytics import YOLO

# Load the exported CoreML model (macOS)
model = YOLO("yolo26n.mlpackage")

# Validate accuracy on the COCO8 dataset
metrics = model.val(data="coco8.yaml")

Link to this sectionExport Arguments#

ArgumentTypeDefaultDescription
formatstr'coreml'Target format for the exported model, defining compatibility with various deployment environments.
imgszint or tuple640Desired image size for the model input. Can be an integer for square images or a tuple (height, width) for specific dimensions.
halfboolFalseEnables FP16 weight quantization, halving model size with negligible accuracy impact — a good default for the Neural Engine.
int8boolFalseEnables INT8 weight quantization for the smallest models; the official Ultralytics app models ship as INT8.
nmsboolFalseEmbeds a CoreML NMS pipeline. Not needed for NMS-free YOLO26; use for earlier models like YOLO11.
dynamicboolFalseAllows dynamic input sizes, enhancing flexibility in handling varying image dimensions.
batchint1Specifies export model batch inference size or the max number of images the exported model will process concurrently in predict mode.
devicestrNoneSpecifies the device for exporting: GPU (device=0), CPU (device=cpu), MPS for Apple silicon (device=mps).

For more details about the export process, visit the Ultralytics documentation page on exporting.

Link to this sectionTargeting the Neural Engine#

CoreML chooses hardware via MLModelConfiguration.computeUnits. The Ultralytics iOS SDK defaults to .cpuAndNeuralEngine on iOS 16+ rather than .all: in a real-time camera app the GPU is already busy compositing the preview and overlays, so excluding it avoids contention and frame-time jitter while the ANE does the heavy lifting. Pin .cpuOnly only for compatibility testing — the table above shows what it costs.

Link to this sectionDeploying Exported YOLO26 CoreML Models#

The fastest path is the official Ultralytics YOLO iOS SDK, the same Swift package that powers the Ultralytics iOS app and the Flutter plugin. It resolves official model names automatically, downloads and caches the .mlpackage, and returns fully decoded results:

import UltralyticsYOLO

// Loads the official INT8 model (downloaded and cached on first use), then runs inference
let yolo = YOLO("yolo26n", task: .detect) { result in
    if case .success(let model) = result {
        let results = model(uiImage)  // boxes, labels, confidences, timing
    }
}

For camera apps, drop in the SDK's YOLOView for real-time inference with native overlays, or use the Flutter plugin for cross-platform apps that share one codebase with Android.

Integrating a raw .mlpackage yourself is also straightforward with Apple's stack — load it with MLModel, wrap it in a VNCoreMLRequest, and feed images through VNImageRequestHandler. These resources cover the details:

Ship the model either embedded in the app bundle (instant availability, ideal for nano/small models) or downloaded on first run and cached (smaller binary, easy model updates) — the official apps use the second approach with the GitHub release assets.

  1. Train your model with Ultralytics Train mode, or start from the official YOLO26 weights
  2. Export with model.export(format="coreml", int8=True) on macOS or x86 Linux
  3. Verify accuracy with model.val() on a Mac, and profile with an Xcode Core ML Performance Report on your target device
  4. Deploy with the iOS SDK, the Flutter plugin, or your own Vision integration, targeting .cpuAndNeuralEngine

Link to this sectionSummary#

In this guide, you learned how to export Ultralytics YOLO26 models to CoreML's .mlpackage format, quantize them for the Apple Neural Engine, and deploy them at single-digit-millisecond latencies — either through the official iOS SDK and Flutter plugin or your own Vision integration. For other deployment targets, browse the integration guide page, and compare formats with Benchmark mode.

Link to this sectionFAQ#

Link to this sectionHow do I export YOLO26 models to CoreML format?#

Run model.export(format="coreml") in Python or yolo export model=yolo26n.pt format=coreml from the CLI on macOS or x86 Linux. Add int8=True to match the official app models. The export produces a yolo26n.mlpackage ML Program ready for Xcode, the iOS SDK, or the Flutter plugin.

Link to this sectionDo I need nms=True when exporting YOLO26?#

No. YOLO26 is NMS-free end-to-end, so the exported graph already emits final detections and decode costs well under a millisecond. The nms=True option exists for earlier models such as YOLO11, where it embeds a CoreML NMS pipeline so your app does not have to implement suppression.

Link to this sectionWhich precision should I use — FP16 or INT8?#

The official Ultralytics app models ship as INT8, which minimizes download size and runs at the speeds in the table above. half=True (FP16) is a conservative alternative with essentially no accuracy loss. Validate your exact export with model.val() on a Mac before shipping.

Link to this sectionHow do I make sure inference runs on the Neural Engine?#

Set MLModelConfiguration.computeUnits = .cpuAndNeuralEngine (the iOS SDK default on iOS 16+). Avoid .all in camera apps — the GPU is busy compositing the preview, and scheduling inference there causes frame-time jitter. Confirm placement with an Xcode Core ML Performance Report.

Link to this sectionCan I run and validate CoreML models with the Ultralytics CLI?#

Yes, on macOS: yolo predict model=yolo26n.mlpackage source=image.jpg and yolo val model=yolo26n.mlpackage data=coco8.yaml work like any other format. CoreML execution requires Apple hardware, so these modes are unavailable on Linux and Windows.

Link to this sectionWhat is the fastest way to get YOLO26 running in an iOS or Flutter app?#

Use the official Ultralytics YOLO iOS SDK (Swift Package) or the Flutter plugin. Both load official models by name with automatic download and caching, run them on the Neural Engine, and include complete real-time camera UIs — the measured performance table above was produced with exactly this stack.

Comments