Link to this sectionExport YOLO Models to LiteRT for Edge and Web Deployment#
LiteRT (short for Lite Runtime) is Google's high-performance runtime for on-device AI. It is the next generation and the new name for TensorFlow Lite (TFLite), and it runs the same .tflite model format. With LiteRT, a single exported Ultralytics YOLO model deploys across mobile, embedded, edge, and the browser — covering everything that the older tflite and tfjs export formats handled separately, now under one umbrella.
The LiteRT export format optimizes your models for tasks like object detection, segmentation, pose estimation, and classification so they run fast and offline on a wide range of devices.
Link to this sectionWhy Should You Export to LiteRT?#
LiteRT is an open-source framework designed for on-device inference, also known as edge computing. It gives developers the tools to execute trained models on mobile, embedded, and IoT devices, traditional computers, and — through LiteRT.js — directly in web browsers and Node.js.
One model format, every target:
- Mobile & Embedded: Android, iOS, embedded Linux, and microcontrollers (MCUs).
- Edge accelerators: Compatible with the Coral Edge TPU for further acceleration.
- Browser & Node.js: LiteRT.js runs the same
.tflitemodel on the web with WebGPU/WASM acceleration — replacing the need for a separate TensorFlow.js export.
Link to this sectionKey Features of LiteRT Models#
- On-device Optimization: Reduces latency by processing data locally, enhances privacy by not transmitting personal data, and minimizes model size to save space.
- Multiple Platform Support: Runs on Android, iOS, embedded Linux, microcontrollers, and modern web browsers.
- Hardware Acceleration: Leverages XNNPACK on CPU, and GPU acceleration via OpenCL, Metal, and WebGPU. The GPU delegate runs in FP16 by default for additional speed.
- Quantization: Supports FP32, static INT8 (
quantize=8, int8 weights + int8 activations), static INT16-activation (quantize="w8a16", int8 weights + int16 activations for higher accuracy), and dynamic INT8 (quantize="w8a32", int8 weights + FP32 activations, no calibration data needed) to compress models and speed up inference with minimal accuracy loss. - Diverse Language Support: Compatible with Java/Kotlin, Swift, Objective-C, C++, Python, and JavaScript.
Link to this sectionExport to LiteRT: Converting Your YOLO Model#
You can improve on-device execution efficiency and broaden deployment options by converting your models to the LiteRT format.
Link to this sectionInstallation#
To install the required package, run:
# Install the required package for YOLO
pip install ultralyticsFor detailed instructions and best practices, check our Ultralytics Installation guide. If you encounter any difficulties, consult our Common Issues guide.
LiteRT export is currently supported on Linux x86_64 and macOS. The exported .tflite model itself runs on all LiteRT-supported platforms (mobile, embedded, edge, and the browser).
Link to this sectionUsage#
All Ultralytics YOLO models support export out of the box. The LiteRT format supports the Export, Predict, and Validate modes, so you can export a model, then load it to run inference or validate its accuracy locally.
from ultralytics import YOLO
# Load a YOLO26 model
model = YOLO("yolo26n.pt")
# Export the model to LiteRT format
model.export(format="litert") # creates 'yolo26n.tflite'from ultralytics import YOLO
model = YOLO("yolo26n.pt")
# Dynamic INT8: int8 weights, FP32 activations - no calibration data needed
model.export(format="litert", quantize="w8a32") # creates 'yolo26n_w8a32.tflite'
# Static INT8: int8 weights + int8 activations - needs calibration data
model.export(format="litert", quantize=8, data="coco8.yaml") # creates 'yolo26n_int8.tflite'
# Static w8a16: int8 weights + int16 activations (higher accuracy) - needs calibration data
model.export(format="litert", quantize="w8a16", data="coco8.yaml") # creates 'yolo26n_w8a16.tflite'from ultralytics import YOLO
# Load the exported LiteRT model
model = YOLO("yolo26n.tflite")
# Run inference
results = model("https://ultralytics.com/images/bus.jpg")from ultralytics import YOLO
# Load the exported LiteRT model
model = YOLO("yolo26n.tflite")
# Validate accuracy on the COCO8 dataset
metrics = model.val(data="coco8.yaml")Link to this sectionExport Arguments#
| Argument | Type | Default | Description |
|---|---|---|---|
format | str | 'litert' | Target format for the exported model, defining compatibility with various deployment environments. |
imgsz | int or tuple | 640 | Desired image size for the model input. Can be an integer for square images or a tuple (height, width) for specific dimensions. |
quantize | int or str | None | Quantization precision: 8 (static INT8, int8 weights + int8 activations; needs calibration data/fraction), 'w8a16' (static, int8 weights + int16 activations; needs calibration data/fraction), 'w8a32' (dynamic INT8, int8 weights + FP32 activations; no calibration needed), or 32/unset (FP32). FP16 is not exported separately (see note below). Replaces the deprecated half/int8 flags. |
batch | int | 1 | Specifies export model batch inference size or the max number of images the exported model will process concurrently in predict mode. |
data | str | 'coco8.yaml' | Dataset YAML used for INT8 calibration. If omitted with quantize=8, Ultralytics selects the default calibration dataset for the model task. |
device | str | None | Specifies the device for exporting. LiteRT export runs on CPU (device=cpu). |
Unlike the legacy tflite export, LiteRT does not require a separate FP16 export. An FP32 .tflite model runs in half precision at runtime when using a GPU delegate (WebGPU, OpenCL, Metal) — this is the official LiteRT approach to FP16 inference.
For more details about the export process, visit the Ultralytics documentation page on exporting.
Link to this sectionDeploying Exported YOLO LiteRT Models#
After exporting your Ultralytics YOLO model to LiteRT, you can deploy it across platforms. The quickest way to verify it locally is the YOLO("yolo26n.tflite") method shown above. For deployment in other environments, see the following resources:
Link to this sectionMobile & Embedded#
- Android: A quick-start guide for integrating LiteRT into Android applications.
- iOS: A guide for integrating and deploying LiteRT models in iOS applications.
- Embedded Linux & Raspberry Pi: Run LiteRT models on single-board computers, optionally accelerated with a Coral Edge TPU.
- Microcontrollers: Deploy on MCUs with only a few kilobytes of memory — the core runtime fits in roughly 16 KB on an Arm Cortex-M3.
Link to this sectionBrowser & Node.js (LiteRT.js)#
- LiteRT.js overview: Run the same
.tflitemodel directly in the browser with WebGPU/WASM acceleration, eliminating server-side computation and keeping data on the user's device. - End-to-End Examples: Practical examples and tutorials for implementing LiteRT across mobile, edge, and web.
Link to this sectionSummary#
In this guide, we covered how to export Ultralytics YOLO models to the LiteRT format. By consolidating mobile/edge (formerly TFLite) and browser (formerly TF.js) deployment into a single .tflite model, LiteRT makes your YOLO models faster, smaller, and portable across virtually every on-device target.
For further details, visit the LiteRT official documentation.
Also, if you're curious about other Ultralytics YOLO integrations, check out our integration guide page for plenty of helpful resources.
Link to this sectionFAQ#
Link to this sectionHow do I export a YOLO model to LiteRT format?#
Use the Ultralytics library to export a YOLO model to LiteRT (.tflite). First, install the package:
pip install ultralyticsThen export your model:
from ultralytics import YOLO
# Load a YOLO26 model
model = YOLO("yolo26n.pt")
# Export the model to LiteRT format
model.export(format="litert") # creates 'yolo26n.tflite'For CLI users:
yolo export model=yolo26n.pt format=litert # creates 'yolo26n.tflite'For more details, visit the Ultralytics export guide.
Link to this sectionWhat is the difference between LiteRT, TFLite, and TF.js?#
LiteRT is the new name for TensorFlow Lite — same .tflite model format, same runtime lineage, rebranded by Google. In Ultralytics, the single litert export format now covers both use cases that previously required two separate formats:
- The old
tfliteformat → mobile, embedded, and edge deployment. - The old
tfjsformat → browser and Node.js deployment, now handled by LiteRT.js running the same.tflitefile.
If you have an existing .tflite file, you can load it directly with YOLO("model.tflite") and it will run through the LiteRT backend.
Link to this sectionCan I run YOLO LiteRT models on a Raspberry Pi?#
Yes. Export your model to LiteRT format, then run it on a Raspberry Pi to improve inference speeds. For further optimization, consider a Coral Edge TPU. For detailed steps, refer to our Raspberry Pi deployment guide.
Link to this sectionCan I run YOLO models in the browser with LiteRT?#
Yes. LiteRT.js runs the same exported .tflite model directly in a web browser or Node.js application, with WebGPU/WASM acceleration. This replaces the previous TensorFlow.js workflow — there is no separate browser export, just deploy your LiteRT model with the LiteRT.js runtime.
Link to this sectionDoes LiteRT support FP16 (half-precision) inference?#
Yes — at runtime. An FP32 LiteRT model automatically runs in FP16 when executed on a GPU delegate (WebGPU, OpenCL, or Metal), which is the official LiteRT approach. You therefore don't need a dedicated FP16 export; for further compression, use INT8 quantization with quantize=8.
Link to this sectionHow do I troubleshoot common issues during LiteRT export?#
If you encounter errors while exporting YOLO models to LiteRT, common solutions include:
- Check platform: LiteRT export is supported on Linux x86_64 and macOS. Verify your environment matches.
- Check package compatibility: Ensure you're using a compatible version of Ultralytics. Refer to our installation guide.
- Quantization issues: When using INT8 quantization, make sure your dataset path is correctly specified in the
dataparameter.
For additional troubleshooting tips, visit our Common Issues guide.