Link to this section将 YOLO 模型导出为 LiteRT，用于边缘设备和 Web 部署#

LiteRT（Lite Runtime 的缩写）是 Google 推出的用于端侧 AI 的高性能运行时。它是 TensorFlow Lite (TFLite) 的下一代产品和新名称，运行相同的 .tflite 模型格式。使用 LiteRT，单个导出的 Ultralytics YOLO 模型即可部署在移动设备、嵌入式设备、边缘设备和浏览器上——它涵盖了以往需要通过 tflite 和 tfjs 导出格式分别处理的内容，现在统一整合在一个架构下。

LiteRT 导出格式可针对目标检测、分割、姿态估计和分类等任务优化你的模型，使其能够在各种设备上快速、离线地运行。

立即通过官方 Flutter 插件在 Android 上使用 LiteRT 运行 YOLO。

官方 Ultralytics YOLO Flutter 插件可在 Android 上直接运行 LiteRT .tflite 导出模型，支持实时摄像头推理、单图像预测、GPU 加速以及针对所有七项 YOLO26 任务（包括 Depth）的自动模型下载。对于 Apple 设备，请使用 CoreML 导出；对于 Qualcomm Snapdragon NPU，请查看 Qualcomm QNN 集成。

立即通过官方 @ultralytics/yolo npm 包在 Web 上使用 LiteRT.js 运行 YOLO。

The official Ultralytics YOLO NPM package runs LiteRT .tflite exports directly in the browser via LiteRT.js no server or Python required — with real-time webcam inference, single-image prediction, and WebGPU acceleration (automatic CPU/WASM fallback) across all six YOLO26 tasks (detect, segment, pose, OBB, classify, semantic). On WebGPU it's often ~2× faster than ONNX Runtime Web.

npm i @ultralytics/yolo @litertjs/core

Link to this section为什么你应该导出到 LiteRT？#

LiteRT 是一个专为端侧推理设计的开源框架，也称为边缘计算。它为开发者提供了在移动设备、嵌入式设备、IoT 设备、传统计算机上，以及通过 LiteRT.js 直接在 Web 浏览器和 Node.js 中执行训练模型的工具。

一种模型格式，适用于所有目标：

移动与嵌入式：Android、iOS、嵌入式 Linux 和微控制器 (MCU)。
边缘加速器：兼容 Coral Edge TPU 以进一步加速。
浏览器与 Node.js：LiteRT.js 可通过 WebGPU/WASM 加速在 Web 上运行相同的 .tflite 模型，无需单独进行 TensorFlow.js 导出。

Link to this sectionLiteRT 模型的主要特点#

端侧优化：通过本地处理数据降低延迟，通过不传输个人数据增强隐私，并最小化模型大小以节省空间。
多平台支持：可在 Android、iOS、嵌入式 Linux、微控制器和现代 Web 浏览器上运行。
硬件加速：利用 CPU 上的 XNNPACK，以及通过 OpenCL、Metal 和 WebGPU 进行 GPU 加速。GPU 委托默认以 FP16 运行以获得额外速度。
量化：支持 FP32、静态 INT8（quantize=8，int8 权重 + int8 激活）、静态 INT16 激活（quantize="w8a16"，int8 权重 + int16 激活，以获得更高精度）和动态 INT8（quantize="w8a32"，int8 权重 + FP32 激活，无需校准数据），以压缩模型并在最小化精度损失的情况下加快推理速度。
多语言支持：兼容 Java/Kotlin、Swift、Objective-C、C++、Python 和 JavaScript。

Link to this section性能测试#

End-to-end single-image inference for the official YOLO26n Android LiteRT assets (w8a32: int8 weights, FP32 activations) on a Xiaomi 17 phone powered by the Qualcomm Snapdragon 8 Elite Gen 5 (SM8850), measured through the Ultralytics Flutter plugin 0.6.10. Each cell shows the total time (preprocessing + inference + postprocessing, excluding annotation) with the per-stage split beneath it. CPU runs the LiteRT XNNPACK delegate; GPU runs the LiteRT OpenCL/GL delegate (FP16).

模型	任务	尺寸 ^(像素)	CPU ^{w8a32 LiteRT (ms)}	GPU Adreno ^{w8a32 LiteRT (ms)}
YOLO26n	检测	640	52.4 ^{1.8 / 48.2 / 2.4}	13.5 ^{1.9 / 8.1 / 3.5}
YOLO26n-seg	分割	640	72.8 ^{1.8 / 65.3 / 5.7}	28.6 ^{1.8 / 20.1 / 6.7}
YOLO26n-sem	语义	640	60.3 ^{1.8 / 50.4 / 8.1}	32.9 ^{1.8 / 23.0 / 8.2}
YOLO26n-depth	Depth	640	325.1 ^{5.1 / 300.9 / 19.2}	23.0 ^{2.0 / 12.9 / 8.2}
YOLO26n-cls	分类	224	10.5 ^{0.9 / 9.6 / 0.1}	3.2 ^{1.0 / 2.2 / 0.1}
YOLO26n-pose	姿态	640	56.9 ^{1.8 / 53.9 / 1.2}	14.0 ^{1.9 / 9.3 / 2.8}
YOLO26n-obb	OBB	640	50.5 ^{1.8 / 47.3 / 1.4}	13.0 ^{2.9 / 7.9 / 2.3}

速度值是单图突发延迟——即在 bus.jpg 上进行 3 次预热运行后，取 15 次运行的平均值，并通过 Flutter 插件的设备端基准测试工具在 profile 模式下测得。整个任务套件连续运行，因此 CPU 限制的预处理阶段反映了持续运行的情况（热平衡单任务测量值会更低）；GPU/CPU 推理阶段为稳态计算成本。
LiteRT 导出过程直接跟踪 PyTorch 模型，生成带有浮点输入的 NCHW .tflite 文件——GPU 委托会编译整个图结构（此处所有七项任务均在 Adreno GPU 上运行），且 w8a32 无需校准数据。官方 Android 资源托管在 yolo-flutter-app v0.6.6 版本中，详细的基准测试记录位于 Flutter 性能文档中。
匹配的 Snapdragon Hexagon NPU 数据（以及 INT8 TFLite CPU/GPU 基准）详见 Qualcomm QNN 集成。

Link to this section导出到 LiteRT：转换你的 YOLO 模型#

通过将模型转换为 LiteRT 格式，你可以提高端侧执行效率并拓宽部署选项。

Link to this section安装#

要安装所需的软件包，请运行：

安装

# Install the required package for YOLO
pip install ultralytics

有关详细说明和最佳实践，请查看我们的 Ultralytics 安装指南。如果遇到任何困难，请查阅我们的常见问题指南。

平台支持

目前在 Linux x86_64 和 macOS 上支持 LiteRT 导出。导出的 .tflite 模型本身可在所有支持 LiteRT 的平台（移动设备、嵌入式设备、边缘设备和浏览器）上运行。

Link to this section用法#

所有 Ultralytics YOLO 模型均支持开箱即用导出。LiteRT 格式支持导出、预测和验证模式，因此你可以导出模型，然后将其加载以进行本地推理或验证其精度。

导出

from ultralytics import YOLO

# Load a YOLO26 model
model = YOLO("yolo26n.pt")

# Export the model to LiteRT format
model.export(format="litert")  # creates 'yolo26n.tflite'

量化导出

from ultralytics import YOLO

model = YOLO("yolo26n.pt")

# Dynamic INT8: int8 weights, FP32 activations - no calibration data needed
model.export(format="litert", quantize="w8a32")  # creates 'yolo26n_w8a32.tflite'

# Static INT8: int8 weights + int8 activations - needs calibration data
model.export(format="litert", quantize=8, data="coco8.yaml")  # creates 'yolo26n_int8.tflite'

# Static w8a16: int8 weights + int16 activations (higher accuracy) - needs calibration data
model.export(format="litert", quantize="w8a16", data="coco8.yaml")  # creates 'yolo26n_w8a16.tflite'

预测

from ultralytics import YOLO

# Load the exported LiteRT model
model = YOLO("yolo26n.tflite")

# Run inference
results = model("https://ultralytics.com/images/bus.jpg")

验证

from ultralytics import YOLO

# Load the exported LiteRT model
model = YOLO("yolo26n.tflite")

# Validate accuracy on the COCO8 dataset
metrics = model.val(data="coco8.yaml")

Link to this section导出参数#

参数	类型	默认值	描述
`format`	`str`	`'litert'`	导出模型的目标格式，定义了与各种部署环境的兼容性。
`imgsz`	`int` 或 `tuple`	`640`	模型输入的期望图像尺寸。可以是一个用于正方形图像的整数，或者是一个用于特定尺寸的元组 `(height, width)`。
`quantize`	`int` 或 `str`	`None`	量化精度：`8`（静态 INT8，int8 权重 + int8 激活；需要校准 `data`/`fraction`）、`'w8a16'`（静态，int8 权重 + int16 激活；需要校准 `data`/`fraction`）、`'w8a32'`（动态 INT8，int8 权重 + FP32 激活；无需校准）或 `32`/不设置（FP32）。FP16 不会单独导出（见下文说明）。取代了已弃用的 `half`/`int8` 标志。
`batch`	`int`	`1`	指定导出模型的推理批次大小，或导出模型在 `predict` 模式下并发处理的最大图像数量。
`data`	`str`	`'coco8.yaml'`	用于 INT8 校准的 Dataset YAML。如果在 `quantize=8` 时省略，Ultralytics 将为模型任务选择默认校准数据集。
`device`	`str`	`None`	指定导出设备。LiteRT 导出在 CPU 上运行（`device=cpu`）。

FP16 精度

与旧版 tflite 导出不同，LiteRT 不需要单独的 FP16 导出。当使用 GPU 委托（WebGPU、OpenCL、Metal）时，FP32 .tflite 模型在运行时以半精度运行——这是官方处理 FP16 推理的方式。

有关导出过程的更多详细信息，请访问 Ultralytics 导出文档页面。

Link to this section部署导出的 YOLO LiteRT 模型#

将你的 Ultralytics YOLO 模型导出为 LiteRT 后，你可以在不同平台上进行部署。在本地验证的最快方法是使用上面显示的 YOLO("yolo26n.tflite") 方法。有关在其他环境中的部署，请参阅以下资源：

Link to this section移动与嵌入式#

Android：将 LiteRT 集成到 Android 应用程序的快速入门指南。
iOS：在 iOS 应用程序中集成和部署 LiteRT 模型的指南。
嵌入式 Linux 和 Raspberry Pi：在单板计算机上运行 LiteRT 模型，可选配 Coral Edge TPU 加速。
微控制器：在仅有几千字节内存的 MCU 上部署——核心运行时在 Arm Cortex-M3 上仅占约 16 KB。

Link to this section浏览器与 Node.js (LiteRT.js)#

LiteRT.js 概览：通过 WebGPU/WASM 加速直接在浏览器中运行相同的 .tflite 模型，消除了服务器端计算并将数据保留在用户设备上。
端到端示例：在移动设备、边缘设备和 Web 上实现 LiteRT 的实用示例和教程。

Link to this section总结#

在本指南中，我们介绍了如何将 Ultralytics YOLO 模型导出为 LiteRT 格式。通过将移动设备/边缘设备（以前称为 TFLite）和浏览器（以前称为 TF.js）部署整合为一个 .tflite 模型，LiteRT 使你的 YOLO 模型更快、更小，并且可在几乎所有端侧目标上移植。

欲了解更多详情，请访问 LiteRT 官方文档。

此外，如果你对其他 Ultralytics YOLO 集成感兴趣，请查看我们的集成指南页面获取大量有用的资源。

Link to this section常见问题解答#

Link to this section如何将 YOLO 模型导出为 LiteRT 格式？#

使用 Ultralytics 库将 YOLO 模型导出为 LiteRT (.tflite)。首先，安装该软件包：

pip install ultralytics

然后导出你的模型：

from ultralytics import YOLO

# Load a YOLO26 model
model = YOLO("yolo26n.pt")

# Export the model to LiteRT format
model.export(format="litert")  # creates 'yolo26n.tflite'

对于 CLI 用户：

yolo export model=yolo26n.pt format=litert # creates 'yolo26n.tflite'

有关更多详细信息，请访问 Ultralytics 导出指南。

Link to this sectionLiteRT、TFLite 和 TF.js 之间有什么区别？#

LiteRT 是 TensorFlow Lite 的新名称——相同的 .tflite 模型格式，相同的运行时系列，由 Google 品牌重塑。在 Ultralytics 中，单个 litert 导出格式现在涵盖了以前需要两种单独格式的用例：

旧的 tflite 格式 → 移动设备、嵌入式设备和边缘设备部署。
旧版的 tfjs 格式 → 浏览器和 Node.js 部署，现在已由 LiteRT.js 处理，运行的是同一个 .tflite 文件。

如果你已有 .tflite 文件，可以直接使用 YOLO("model.tflite") 加载它，它将通过 LiteRT 后端运行。

Link to this section我可以在 Raspberry Pi 上运行 YOLO LiteRT 模型吗？#

可以。将你的模型导出为 LiteRT 格式，然后在 Raspberry Pi 上运行以提高推理速度。如需进一步优化，请考虑使用 Coral Edge TPU。有关详细步骤，请参阅我们的 Raspberry Pi 部署指南。

Link to this section我可以在浏览器中使用 LiteRT 运行 YOLO 模型吗？#

可以。LiteRT.js 可通过 WebGPU/WASM 加速直接在 Web 浏览器或 Node.js 应用程序中运行相同的导出 .tflite 模型。这取代了之前的 TensorFlow.js 工作流程——无需单独的浏览器导出，只需使用 LiteRT.js 运行时部署你的 LiteRT 模型即可。

Link to this sectionLiteRT 支持 FP16（半精度）推理吗？#

支持——在运行时。FP32 LiteRT 模型在 GPU 委托（WebGPU、OpenCL 或 Metal）上执行时会自动以 FP16 运行，这是官方的 LiteRT 方法。因此，你不需要专门的 FP16 导出；如需进一步压缩，请使用 quantize=8 进行 INT8 量化。

Link to this section如何排查 LiteRT 导出过程中的常见问题？#

如果你在将 YOLO 模型导出为 LiteRT 时遇到错误，常见解决方案包括：

检查平台：LiteRT 导出在 Linux x86_64 和 macOS 上支持。请验证你的环境是否匹配。
检查包兼容性：确保你使用的是兼容版本的 Ultralytics。请参考我们的安装指南。
量化问题：在使用 INT8 量化时，请确保在 data 参数中正确指定了你的数据集路径。

有关其他排查技巧，请访问我们的常见问题指南。

贡献者

GLglenn-jocher⁴ ONonuralpszr¹ AMambitious-octopus¹

创建于 2周前更新于 4天前