Link to this sectionExport YOLO Models to LiteRT for Edge and Web Deployment#

Q: What is the difference between LiteRT, TFLite, and TF.js?

LiteRT is the new name for TensorFlow Lite — same .tflite model format, same runtime lineage, rebranded by Google. In Ultralytics, the single litert export format now covers both use cases that previously required two separate formats: The old tflite format → mobile, embedded, and edge deployment. The old tfjs format → browser and Node.js deployment, now handled by LiteRT.js running the same .tflite file. If you have an existing .tflite file, you can load it directly with YOLO("model.tflite") and it will run through the LiteRT backend.

Q: Can I run YOLO models in the browser with LiteRT?

Yes. LiteRT.js runs the same exported .tflite model directly in a web browser or Node.js application, with WebGPU/WASM acceleration. This replaces the previous TensorFlow.js workflow — there is no separate browser export, just deploy your LiteRT model with the LiteRT.js runtime.

LiteRT (short for Lite Runtime) is Google's high-performance runtime for on-device AI. It is the next generation and the new name for TensorFlow Lite (TFLite), and it runs the same .tflite model format. With LiteRT, a single exported Ultralytics YOLO model deploys across mobile, embedded, edge, and the browser — covering everything that the older tflite and tfjs export formats handled separately, now under one umbrella.

The LiteRT export format optimizes your models for tasks like object detection, segmentation, pose estimation, and classification so they run fast and offline on a wide range of devices.

Run YOLO on Android with LiteRT today via the official Flutter plugin

The official Ultralytics YOLO Flutter plugin runs LiteRT .tflite exports on Android out of the box — real-time camera inference, single-image prediction, GPU acceleration, and automatic model download for all seven YOLO26 tasks, including Depth. For Apple devices use the CoreML export; for Qualcomm Snapdragon NPUs see the Qualcomm QNN integration.

Run YOLO on Web with LiteRT.js today via the official @ultralytics/yolo npm package

The official Ultralytics YOLO NPM package runs LiteRT .tflite exports directly in the browser via LiteRT.js no server or Python required — with real-time webcam inference, single-image prediction, and WebGPU acceleration (automatic CPU/WASM fallback) across all six YOLO26 tasks (detect, segment, pose, OBB, classify, semantic). On WebGPU it's often ~2× faster than ONNX Runtime Web.

npm i @ultralytics/yolo @litertjs/core

Link to this sectionWhy Should You Export to LiteRT?#

LiteRT is an open-source framework designed for on-device inference, also known as edge computing. It gives developers the tools to execute trained models on mobile, embedded, and IoT devices, traditional computers, and — through LiteRT.js — directly in web browsers and Node.js.

One model format, every target:

Mobile & Embedded: Android, iOS, embedded Linux, and microcontrollers (MCUs).
Edge accelerators: Compatible with the Coral Edge TPU for further acceleration.
Browser & Node.js: LiteRT.js runs the same .tflite model on the web with WebGPU/WASM acceleration — replacing the need for a separate TensorFlow.js export.

Link to this sectionKey Features of LiteRT Models#

On-device Optimization: Reduces latency by processing data locally, enhances privacy by not transmitting personal data, and minimizes model size to save space.
Multiple Platform Support: Runs on Android, iOS, embedded Linux, microcontrollers, and modern web browsers.
Hardware Acceleration: Leverages XNNPACK on CPU, and GPU acceleration via OpenCL, Metal, and WebGPU. The GPU delegate runs in FP16 by default for additional speed.
Quantization: Supports FP32, static INT8 (quantize=8, int8 weights + int8 activations), static INT16-activation (quantize="w8a16", int8 weights + int16 activations for higher accuracy), and dynamic INT8 (quantize="w8a32", int8 weights + FP32 activations, no calibration data needed) to compress models and speed up inference with minimal accuracy loss.
Diverse Language Support: Compatible with Java/Kotlin, Swift, Objective-C, C++, Python, and JavaScript.

Link to this sectionMeasured Performance#

End-to-end single-image inference for the official YOLO26n Android LiteRT assets (w8a32: int8 weights, FP32 activations) on a Xiaomi 17 phone powered by the Qualcomm Snapdragon 8 Elite Gen 5 (SM8850), measured through the Ultralytics Flutter plugin 0.6.10. Each cell shows the total time (preprocessing + inference + postprocessing, excluding annotation) with the per-stage split beneath it. CPU runs the LiteRT XNNPACK delegate; GPU runs the LiteRT OpenCL/GL delegate (FP16).

Model	Task	size ^(pixels)	CPU ^{w8a32 LiteRT (ms)}	GPU Adreno ^{w8a32 LiteRT (ms)}
YOLO26n	Detect	640	52.4 ^{1.8 / 48.2 / 2.4}	13.5 ^{1.9 / 8.1 / 3.5}
YOLO26n-seg	Segment	640	72.8 ^{1.8 / 65.3 / 5.7}	28.6 ^{1.8 / 20.1 / 6.7}
YOLO26n-sem	Semantic	640	60.3 ^{1.8 / 50.4 / 8.1}	32.9 ^{1.8 / 23.0 / 8.2}
YOLO26n-depth	Depth	640	325.1 ^{5.1 / 300.9 / 19.2}	23.0 ^{2.0 / 12.9 / 8.2}
YOLO26n-cls	Classify	224	10.5 ^{0.9 / 9.6 / 0.1}	3.2 ^{1.0 / 2.2 / 0.1}
YOLO26n-pose	Pose	640	56.9 ^{1.8 / 53.9 / 1.2}	14.0 ^{1.9 / 9.3 / 2.8}
YOLO26n-obb	OBB	640	50.5 ^{1.8 / 47.3 / 1.4}	13.0 ^{2.9 / 7.9 / 2.3}

Speed values are single-image burst latencies — the mean of 15 runs after 3 warmup runs on bus.jpg, measured with the Flutter plugin's on-device benchmark harness in profile mode. The full task suite runs back-to-back, so the CPU-bound preprocessing stage reflects sustained operation (a thermally rested single-task measurement is lower); the GPU/CPU inference stage is the steady-state compute cost.
The LiteRT export traces the PyTorch model directly, producing an NCHW .tflite with a float input — the GPU delegate compiles the whole graph (all seven tasks run on the Adreno GPU here), and w8a32 needs no calibration data. The official Android assets are hosted on the yolo-flutter-app v0.6.6 release, with the detailed benchmark record in the Flutter performance doc.
The matching Snapdragon Hexagon NPU numbers (and the INT8 TFLite CPU/GPU baseline) are in the Qualcomm QNN integration.

Link to this sectionExport to LiteRT: Converting Your YOLO Model#

You can improve on-device execution efficiency and broaden deployment options by converting your models to the LiteRT format.

Link to this sectionInstallation#

To install the required package, run:

Installation

# Install the required package for YOLO
pip install ultralytics

For detailed instructions and best practices, check our Ultralytics Installation guide. If you encounter any difficulties, consult our Common Issues guide.

Platform support

LiteRT export is currently supported on Linux x86_64 and macOS. The exported .tflite model itself runs on all LiteRT-supported platforms (mobile, embedded, edge, and the browser).

Link to this sectionUsage#

All Ultralytics YOLO models support export out of the box. The LiteRT format supports the Export, Predict, and Validate modes, so you can export a model, then load it to run inference or validate its accuracy locally.

Export

from ultralytics import YOLO

# Load a YOLO26 model
model = YOLO("yolo26n.pt")

# Export the model to LiteRT format
model.export(format="litert")  # creates 'yolo26n.tflite'

Quantized export

from ultralytics import YOLO

model = YOLO("yolo26n.pt")

# Dynamic INT8: int8 weights, FP32 activations - no calibration data needed
model.export(format="litert", quantize="w8a32")  # creates 'yolo26n_w8a32.tflite'

# Static INT8: int8 weights + int8 activations - needs calibration data
model.export(format="litert", quantize=8, data="coco8.yaml")  # creates 'yolo26n_int8.tflite'

# Static w8a16: int8 weights + int16 activations (higher accuracy) - needs calibration data
model.export(format="litert", quantize="w8a16", data="coco8.yaml")  # creates 'yolo26n_w8a16.tflite'

Predict

from ultralytics import YOLO

# Load the exported LiteRT model
model = YOLO("yolo26n.tflite")

# Run inference
results = model("https://ultralytics.com/images/bus.jpg")

Validate

from ultralytics import YOLO

# Load the exported LiteRT model
model = YOLO("yolo26n.tflite")

# Validate accuracy on the COCO8 dataset
metrics = model.val(data="coco8.yaml")

Link to this sectionExport Arguments#

Argument	Type	Default	Description
`format`	`str`	`'litert'`	Target format for the exported model, defining compatibility with various deployment environments.
`imgsz`	`int` or `tuple`	`640`	Desired image size for the model input. Can be an integer for square images or a tuple `(height, width)` for specific dimensions.
`quantize`	`int` or `str`	`None`	Quantization precision: `8` (static INT8, int8 weights + int8 activations; needs calibration `data`/`fraction`), `'w8a16'` (static, int8 weights + int16 activations; needs calibration `data`/`fraction`), `'w8a32'` (dynamic INT8, int8 weights + FP32 activations; no calibration needed), or `32`/unset (FP32). FP16 is not exported separately (see note below). Replaces the deprecated `half`/`int8` flags.
`batch`	`int`	`1`	Specifies export model batch inference size or the max number of images the exported model will process concurrently in `predict` mode.
`data`	`str`	`'coco8.yaml'`	Dataset YAML used for INT8 calibration. If omitted with `quantize=8`, Ultralytics selects the default calibration dataset for the model task.
`device`	`str`	`None`	Specifies the device for exporting. LiteRT export runs on CPU (`device=cpu`).

FP16 precision

Unlike the legacy tflite export, LiteRT does not require a separate FP16 export. An FP32 .tflite model runs in half precision at runtime when using a GPU delegate (WebGPU, OpenCL, Metal) — this is the official LiteRT approach to FP16 inference.

For more details about the export process, visit the Ultralytics documentation page on exporting.

Link to this sectionDeploying Exported YOLO LiteRT Models#

After exporting your Ultralytics YOLO model to LiteRT, you can deploy it across platforms. The quickest way to verify it locally is the YOLO("yolo26n.tflite") method shown above. For deployment in other environments, see the following resources:

Link to this sectionMobile & Embedded#

Android: A quick-start guide for integrating LiteRT into Android applications.
iOS: A guide for integrating and deploying LiteRT models in iOS applications.
Embedded Linux & Raspberry Pi: Run LiteRT models on single-board computers, optionally accelerated with a Coral Edge TPU.
Microcontrollers: Deploy on MCUs with only a few kilobytes of memory — the core runtime fits in roughly 16 KB on an Arm Cortex-M3.

Link to this sectionBrowser & Node.js (LiteRT.js)#

LiteRT.js overview: Run the same .tflite model directly in the browser with WebGPU/WASM acceleration, eliminating server-side computation and keeping data on the user's device.
End-to-End Examples: Practical examples and tutorials for implementing LiteRT across mobile, edge, and web.

Link to this sectionSummary#

In this guide, we covered how to export Ultralytics YOLO models to the LiteRT format. By consolidating mobile/edge (formerly TFLite) and browser (formerly TF.js) deployment into a single .tflite model, LiteRT makes your YOLO models faster, smaller, and portable across virtually every on-device target.

For further details, visit the LiteRT official documentation.

Also, if you're curious about other Ultralytics YOLO integrations, check out our integration guide page for plenty of helpful resources.

Link to this sectionFAQ#

Link to this sectionHow do I export a YOLO model to LiteRT format?#

Use the Ultralytics library to export a YOLO model to LiteRT (.tflite). First, install the package:

pip install ultralytics

Then export your model:

from ultralytics import YOLO

# Load a YOLO26 model
model = YOLO("yolo26n.pt")

# Export the model to LiteRT format
model.export(format="litert")  # creates 'yolo26n.tflite'

For CLI users:

yolo export model=yolo26n.pt format=litert # creates 'yolo26n.tflite'

For more details, visit the Ultralytics export guide.

Link to this sectionWhat is the difference between LiteRT, TFLite, and TF.js?#

LiteRT is the new name for TensorFlow Lite — same .tflite model format, same runtime lineage, rebranded by Google. In Ultralytics, the single litert export format now covers both use cases that previously required two separate formats:

The old tflite format → mobile, embedded, and edge deployment.
The old tfjs format → browser and Node.js deployment, now handled by LiteRT.js running the same .tflite file.

If you have an existing .tflite file, you can load it directly with YOLO("model.tflite") and it will run through the LiteRT backend.

Link to this sectionCan I run YOLO LiteRT models on a Raspberry Pi?#

Yes. Export your model to LiteRT format, then run it on a Raspberry Pi to improve inference speeds. For further optimization, consider a Coral Edge TPU. For detailed steps, refer to our Raspberry Pi deployment guide.

Link to this sectionCan I run YOLO models in the browser with LiteRT?#

Yes. LiteRT.js runs the same exported .tflite model directly in a web browser or Node.js application, with WebGPU/WASM acceleration. This replaces the previous TensorFlow.js workflow — there is no separate browser export, just deploy your LiteRT model with the LiteRT.js runtime.

Link to this sectionDoes LiteRT support FP16 (half-precision) inference?#

Yes — at runtime. An FP32 LiteRT model automatically runs in FP16 when executed on a GPU delegate (WebGPU, OpenCL, or Metal), which is the official LiteRT approach. You therefore don't need a dedicated FP16 export; for further compression, use INT8 quantization with quantize=8.

Link to this sectionHow do I troubleshoot common issues during LiteRT export?#

If you encounter errors while exporting YOLO models to LiteRT, common solutions include:

Check platform: LiteRT export is supported on Linux x86_64 and macOS. Verify your environment matches.
Check package compatibility: Ensure you're using a compatible version of Ultralytics. Refer to our installation guide.
Quantization issues: When using INT8 quantization, make sure your dataset path is correctly specified in the data parameter.

For additional troubleshooting tips, visit our Common Issues guide.

Contributors

GLglenn-jocher⁴ ONonuralpszr¹ AMambitious-octopus¹

Created 2 weeks agoUpdated 4 days ago