Meet YOLO26: next-gen vision AI.

Link to this sectionA Guide on YOLO26 Model Export to TFLite for Deployment#

Deprecated — replaced by LiteRT

As of Ultralytics 8.4.83, the standalone tflite export format has been removed and replaced by the unified Google LiteRT format. LiteRT (Lite Runtime) is the next generation and new name for TensorFlow Lite, and it exports the same .tflite model — now covering mobile, embedded, edge, and browser deployment in one format.

format="tflite" still works but emits a deprecation warning and exports a LiteRT model instead. Use format="litert" going forward; for current export instructions and options, see the LiteRT export guide.

TensorFlow Lite edge deployment framework

Deploying computer vision models on edge devices or embedded devices requires a format that can ensure seamless performance.

The TensorFlow Lite or TFLite export format allows you to optimize your Ultralytics YOLO26 models for tasks like object detection and image classification in edge device-based applications. In this guide, we'll walk through the steps for converting your models to the TFLite format, making it easier for your models to perform well on various edge devices.

Link to this sectionWhy Should You Export to TFLite?#

Introduced by Google in May 2017 as part of their TensorFlow framework, TensorFlow Lite, or TFLite for short, is an open-source deep learning framework designed for on-device inference, also known as edge computing. It gives developers the necessary tools to execute their trained models on mobile, embedded, and IoT devices, as well as traditional computers.

TensorFlow Lite is compatible with a wide range of platforms, including embedded Linux, Android, iOS, and microcontrollers (MCUs). Exporting your model to TFLite makes your applications faster, more reliable, and capable of running offline.

Link to this sectionKey Features of TFLite Models#

TFLite models offer a wide range of key features that enable on-device machine learning by helping developers run their models on mobile, embedded, and edge devices:

  • On-device Optimization: TFLite optimizes for on-device ML, reducing latency by processing data locally, enhancing privacy by not transmitting personal data, and minimizing model size to save space.

  • Multiple Platform Support: TFLite offers extensive platform compatibility, supporting Android, iOS, embedded Linux, and microcontrollers.

  • Diverse Language Support: TFLite is compatible with various programming languages, including Java, Swift, Objective-C, C++, and Python.

  • High Performance: Achieves superior performance through hardware acceleration and model optimization.

Link to this sectionMeasured Performance (historical)#

Before/after reference for the format migration

These TFLite numbers are kept as a historical before/after record for the onnx2tf-TFLite → LiteRT migration: the legacy onnx2tf INT8 TFLite export below versus the new LiteRT w8a32 export (see the LiteRT Measured Performance table). They are shared with the Google LiteRT team to show where the new litert-torch format still regresses against the format it replaced — see Format regressions below.

Per-task before/after on the Adreno GPU of a Xiaomi 17 (Qualcomm Snapdragon 8 Elite Gen 5, SM8850), measured through the Ultralytics Flutter plugin: the legacy onnx2tf INT8 TFLite assets (NHWC, input images) versus the new w8a32 LiteRT assets (NCHW, input args_0), both run on LiteRT 2.x in the same back-to-back sweep at the shipped Android imgsz. Each cell is the total time (preprocessing + inference + postprocessing) with the per-stage split beneath it; both formats compiled fully on the GPU.

ModelTasksize
(pixels)
Before
onnx2tf INT8 TFLite
(ms)
After
w8a32 LiteRT
(ms)
YOLO26nDetect64018.8
7.8 / 8.3 / 2.7
17.4
6.2 / 8.6 / 2.6
YOLO26n-segSegment64045.2
7.8 / 22.9 / 14.4
43.4
7.7 / 21.9 / 13.7
YOLO26n-semSemantic64050.5
6.5 / 27.7 / 16.3
59.4
7.5 / 34.4 / 17.5
YOLO26n-clsClassify2245.9
1.5 / 2.0 / 2.5
4.9
1.7 / 2.0 / 1.3
YOLO26n-posePose64022.5
7.8 / 9.8 / 4.8
24.0
9.1 / 10.4 / 4.5
YOLO26n-obbOBB64019.0
7.7 / 8.4 / 2.9
18.4
7.7 / 8.6 / 2.0

w8a32 LiteRT matches or beats the legacy onnx2tf INT8 format on four of six tasks; semantic is the clear regression (+9 ms, from the NCHW FP32-activation inference and host-side argmax) and pose is ~1.5 ms slower. The legacy onnx2tf models run unchanged on LiteRT 2.x alongside the new NCHW exports.

Link to this sectionFormat regressions vs LiteRT#

Same-device YOLO26n detect on the Adreno GPU — legacy onnx2tf INT8 TFLite versus the four LiteRT quantization formats, all measured in one sustained run (so inference is the comparable, format-dependent metric):

Android formatGPU inference (ms)GPU-compiles
onnx2tf INT8 (legacy TFLite)8.6yes
LiteRT w8a32 (new official)8.4yes
LiteRT INT8 (quantize=8)11.0yes
LiteRT FP328.8yes
LiteRT w8a16 (quantize="w8a16")(CPU fallback)no — fails

Issues for the Google LiteRT / litert-torch team, surfaced migrating production Android assets from onnx2tf TFLite to LiteRT:

  1. NCHW layout adds a per-frame transpose. litert-torch traces the PyTorch model and emits NCHW [1,3,H,W] with a float input, whereas the onnx2tf TFLite export was NHWC [1,H,W,3] — matching the camera/bitmap layout. The new format forces a host-side HWC→CHW transpose every frame that the old format did not need.
  2. quantize="w8a16" does not compile on the GPU (OpenCL) delegate and silently falls back to a CPU path that is ~40× slower (~660 ms vs ~17 ms), making the int16-activation format unusable for GPU deployment.
  3. Static INT8 (quantize=8) is the slowest GPU format — ~11 ms vs ~8.6 ms for the equivalent legacy onnx2tf INT8 model, i.e. LiteRT's own INT8 path regresses against the format it replaced. Dynamic-range w8a32 is the only LiteRT format that matches the old INT8 speed, which is why it is now shipped.
  4. Semantic models export as raw NCHW logits with no in-graph ArgMax option, forcing a cache-unfriendly host-side argmax over [1, C, H, W] (each class plane is a full H×W apart). The onnx2tf, CoreML, and QNN paths can emit a compact class map instead.
  5. Output tensors were renamed output_0, output_1, … (vs onnx2tf Identity, Identity_1, …), which silently broke runtime output-shape lookup until the consumer added the new names.

The corresponding LiteRT w8a32 numbers (the format now shipped) are on the LiteRT page.

Link to this sectionDeployment Options in TFLite#

Before we look at the code for exporting YOLO26 models to the TFLite format, let's understand how TFLite models are normally used.

TFLite offers various on-device deployment options for machine learning models, including:

  • Deploying with Android and iOS: Both Android and iOS applications with TFLite can analyze edge-based camera feeds and sensors to detect and identify objects. TFLite also offers native iOS libraries written in Swift and Objective-C. The architecture diagram below shows the process of deploying a trained model onto Android and iOS platforms using TensorFlow Lite.

TensorFlow Lite deployment architecture for mobile

  • Implementing with Embedded Linux: If running inferences on a Raspberry Pi using the Ultralytics Guide does not meet the speed requirements for your use case, you can use an exported TFLite model to accelerate inference times. Additionally, it's possible to further improve performance by utilizing a Coral Edge TPU device.

  • Deploying with Microcontrollers: TFLite models can also be deployed on microcontrollers and other devices with only a few kilobytes of memory. The core runtime just fits in 16 KB on an Arm Cortex M3 and can run many basic models. It doesn't require operating system support, any standard C or C++ libraries, or dynamic memory allocation.

Link to this sectionExport to TFLite: Converting Your YOLO26 Model#

You can improve on-device model execution efficiency and optimize performance by converting your models to TFLite format.

Link to this sectionInstallation#

To install the required packages, run:

Installation
# Install the required package for YOLO26
pip install ultralytics

For detailed instructions and best practices related to the installation process, check our Ultralytics Installation guide. While installing the required packages for YOLO26, if you encounter any difficulties, consult our Common Issues guide for solutions and tips.

Link to this sectionUsage#

All Ultralytics YOLO26 models are designed to support export out of the box, making it easy to integrate them into your preferred deployment workflow. You can view the full list of supported export formats and configuration options to choose the best setup for your application.

The TFLite format supports the Export, Predict, and Validate modes. Export your model, then load the exported model to run inference or validate its accuracy.

Export
from ultralytics import YOLO

# Load a YOLO26 model
model = YOLO("yolo26n.pt")

# Export the model to TFLite format
model.export(format="litert")  # creates 'yolo26n.tflite'
Predict
from ultralytics import YOLO

# Load the exported TFLite model
model = YOLO("yolo26n.tflite")

# Run inference
results = model("https://ultralytics.com/images/bus.jpg")
Validate
from ultralytics import YOLO

# Load the exported TFLite model
model = YOLO("yolo26n.tflite")

# Validate accuracy on the COCO8 dataset
metrics = model.val(data="coco8.yaml")

Link to this sectionExport Arguments#

ArgumentTypeDefaultDescription
formatstr'tflite'Target format for the exported model, defining compatibility with various deployment environments.
imgszint or tuple640Desired image size for the model input. Can be an integer for square images or a tuple (height, width) for specific dimensions.
quantizeint or strNoneQuantization precision: 8 (static INT8, int8 weights + int8 activations; needs calibration data/fraction), 'w8a16' (static, int8 weights + int16 activations; needs calibration data/fraction), 'w8a32' (dynamic INT8, int8 weights + FP32 activations; no calibration needed), or 32/unset (FP32). FP16 is not exported separately — an FP32 model runs in FP16 automatically on GPU delegates. Replaces the deprecated half/int8 flags.
batchint1Specifies export model batch inference size or the max number of images the exported model will process concurrently in predict mode.
datastr'coco8.yaml'Path to the dataset configuration file (default: coco8.yaml), essential for quantization.
fractionfloat1.0Specifies the fraction of the dataset to use for INT8 quantization calibration. Allows for calibrating on a subset of the full dataset, useful for experiments or when resources are limited. If not specified with INT8 enabled, the full dataset will be used.
devicestrNoneSpecifies the device for exporting: CPU (device=cpu), MPS for Apple silicon (device=mps).

For more details about the export process, visit the Ultralytics documentation page on exporting.

Link to this sectionDeploying Exported YOLO26 TFLite Models#

After successfully exporting your Ultralytics YOLO26 models to TFLite format, you can now deploy them. The primary and recommended first step for running a TFLite model is to use the YOLO("model.tflite") method, as outlined in the previous usage code snippet. However, for in-depth instructions on deploying your TFLite models in various other settings, take a look at the following resources:

  • Android: A quick start guide for integrating TensorFlow Lite into Android applications, providing easy-to-follow steps for setting up and running machine learning models.

  • iOS: Check out this detailed guide for developers on integrating and deploying TensorFlow Lite models in iOS applications, offering step-by-step instructions and resources.

  • End-To-End Examples: This page provides an overview of various TensorFlow Lite examples, showcasing practical applications and tutorials designed to help developers implement TensorFlow Lite in their machine learning projects on mobile and edge devices.

Link to this sectionSummary#

In this guide, we focused on how to export to TFLite format. By converting your Ultralytics YOLO26 models to TFLite model format, you can improve the efficiency and speed of YOLO26 models, making them more effective and suitable for edge computing environments.

For further details on usage, visit the TFLite official documentation.

Also, if you're curious about other Ultralytics YOLO26 integrations, check out our integration guide page. You'll find plenty of helpful information and insights there.

Link to this sectionFAQ#

Link to this sectionHow do I export a YOLO26 model to TFLite format?#

To export a YOLO26 model to TFLite format, you can use the Ultralytics library. First, install the required package using:

pip install ultralytics

Then, use the following code snippet to export your model:

from ultralytics import YOLO

# Load a YOLO26 model
model = YOLO("yolo26n.pt")

# Export the model to TFLite format
model.export(format="litert")  # creates 'yolo26n.tflite'

For CLI users, you can achieve this with:

yolo export model=yolo26n.pt format=litert # creates 'yolo26n.tflite'

For more details, visit the Ultralytics export guide.

Link to this sectionWhat are the benefits of using TensorFlow Lite for YOLO26 model deployment?#

TensorFlow Lite (TFLite) is an open-source deep learning framework designed for on-device inference, making it ideal for deploying YOLO26 models on mobile, embedded, and IoT devices. Key benefits include:

  • On-device optimization: Minimize latency and enhance privacy by processing data locally.
  • Platform compatibility: Supports Android, iOS, embedded Linux, and MCU.
  • Performance: Utilizes hardware acceleration to optimize model speed and efficiency.

To learn more, check out the TFLite guide.

Link to this sectionIs it possible to run YOLO26 TFLite models on Raspberry Pi?#

Yes, you can run YOLO26 TFLite models on Raspberry Pi to improve inference speeds. First, export your model to TFLite format as explained above. Then, use a tool like TensorFlow Lite Interpreter to execute the model on your Raspberry Pi.

For further optimizations, you might consider using Coral Edge TPU. For detailed steps, refer to our Raspberry Pi deployment guide and the Edge TPU integration guide.

Link to this sectionCan I use TFLite models on microcontrollers for YOLO26 predictions?#

Yes, TFLite supports deployment on microcontrollers with limited resources. TFLite's core runtime requires only 16 KB of memory on an Arm Cortex M3 and can run basic YOLO26 models. This makes it suitable for deployment on devices with minimal computational power and memory.

To get started, visit the TFLite Micro for Microcontrollers guide.

Link to this sectionWhat platforms are compatible with TFLite exported YOLO26 models?#

TensorFlow Lite provides extensive platform compatibility, allowing you to deploy YOLO26 models on a wide range of devices, including:

  • Android and iOS: Native support through TFLite Android and iOS libraries.
  • Embedded Linux: Ideal for single-board computers such as Raspberry Pi.
  • Microcontrollers: Suitable for MCUs with constrained resources.

For more information on deployment options, see our detailed deployment guide.

Link to this sectionHow do I troubleshoot common issues during YOLO26 model export to TFLite?#

If you encounter errors while exporting YOLO26 models to TFLite, common solutions include:

  • Check package compatibility: Ensure you're using compatible versions of Ultralytics and TensorFlow. Refer to our installation guide.
  • Model support: Verify that the specific YOLO26 model supports TFLite export by checking the Ultralytics export documentation page.
  • Quantization issues: When using INT8 quantization, make sure your dataset path is correctly specified in the data parameter.

For additional troubleshooting tips, visit our Common Issues guide.

Comments