跳至内容

Ultralytics YOLO11 on NVIDIA Jetson using DeepStream SDK and TensorRT



观看: How to Run Multiple Streams with DeepStream SDK on Jetson Nano using Ultralytics YOLO11

This comprehensive guide provides a detailed walkthrough for deploying Ultralytics YOLO11 on NVIDIA Jetson devices using DeepStream SDK and TensorRT. Here we use TensorRT to maximize the inference performance on the Jetson platform.

NVIDIA Jetson 上的 DeepStream

备注

本指南已在基于NVIDIA Jetson Orin NX 16GB 运行JP5.1.3版 JetPack 的Seeed Studio reComputer J4012和基于NVIDIA Jetson Nano 4GB 运行JP4.6.4 版 JetPack 的Seeed Studio reComputer J1020 v2上进行了测试。预计它将适用于所有NVIDIA Jetson 硬件产品线,包括最新产品和传统产品。

NVIDIA DeepStream 是什么?

NVIDIA's DeepStream SDK is a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio, and image understanding. It's ideal for vision AI developers, software partners, startups, and OEMs building IVA (Intelligent Video Analytics) apps and services. You can now create stream-processing pipelines that incorporate neural networks and other complex processing tasks like tracking, video encoding/decoding, and video rendering. These pipelines enable real-time analytics on video, image, and sensor data. DeepStream's multi-platform support gives you a faster, easier way to develop vision AI applications and services on-premise, at the edge, and in the cloud.

先决条件

在开始遵循本指南之前:

提示

在本指南中,我们使用 Debian 软件包方法将 DeepStream SDK 安装到 Jetson 设备。您也可以访问Jetson 上的 DeepStream SDK(已存档),访问 DeepStream 的旧版本。

DeepStream Configuration for YOLO11

这里我们使用的是marcoslucianops/DeepStream-YoloGitHub 代码库,其中包括NVIDIA DeepStream SDK 对YOLO 模型的支持。我们感谢 marcoslucianops 所做的贡献!

  1. 安装依赖项

    pip install cmake
    pip install onnxsim
    
  2. 克隆以下存储库

    git clone https://github.com/marcoslucianops/DeepStream-Yolo
    cd DeepStream-Yolo
    
  3. Download Ultralytics YOLO11 detection model (.pt) of your choice from YOLO11 releases. Here we use yolov8s.pt.

    wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s.pt
    

    备注

    You can also use a custom trained YOLO11 model.

  4. 将模型转换为 ONNX

    python3 utils/export_yoloV8.py -w yolov8s.pt
    

    将以下参数传递给上述命令

    对于 DeepStream 6.0.1,请使用 opset 12 或更低版本。默认操作集为 16。

    --opset 12
    

    更改推理大小(默认:640)

    -s SIZE
    --size SIZE
    -s HEIGHT WIDTH
    --size HEIGHT WIDTH
    

    1280 示例:

    -s 1280
    or
    -s 1280 1280
    

    为了简化 ONNX 模型 (DeepStream >= 6.0)

    --simplify
    

    使用动态批处理大小 (DeepStream >= 6.1)

    --dynamic
    

    使用静态 batch-size(batch-size = 4 的示例)

    --batch 4
    
  5. 根据安装的 JetPack 版本设置CUDA 版本

    对于 JetPack 4.6.4:

    export CUDA_VER=10.2
    

    对于 JetPack 5.1.3:

    export CUDA_VER=11.4
    
  6. 编译库

    make -C nvdsinfer_custom_impl_Yolo clean && make -C nvdsinfer_custom_impl_Yolo
    
  7. 编辑 config_infer_primary_yoloV8.txt 根据您的模型提交文件(对于 YOLOv8s 有80个班级)

    [property]
    ...
    onnx-file=yolov8s.onnx
    ...
    num-detected-classes=80
    ...
    
  8. 编辑 deepstream_app_config 文件

    ...
    [primary-gie]
    ...
    config-file=config_infer_primary_yoloV8.txt
    
  9. 您还可以更改视频源 deepstream_app_config 文件。这里加载了一个默认的视频文件

    ...
    [source0]
    ...
    uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
    

运行推理

deepstream-app -c deepstream_app_config.txt

备注

生成 TensorRT 引擎文件,然后开始推理。所以请耐心等待。

YOLO11 with deepstream

提示

If you want to convert the model to FP16 precision, simply set model-engine-file=model_b1_gpu0_fp16.enginenetwork-mode=2 里面 config_infer_primary_yoloV8.txt

INT8 校准

如果要使用 INT8 精度进行推理,需要按照以下步骤操作

  1. 设置 OPENCV 环境变量

    export OPENCV=1
    
  2. 编译库

    make -C nvdsinfer_custom_impl_Yolo clean && make -C nvdsinfer_custom_impl_Yolo
    
  3. 有关 COCO 数据集,请下载 val2017提取,并移至 DeepStream-Yolo 文件夹

  4. 为校准图像创建一个新目录

    mkdir calibration
    
  5. 运行以下命令,从COCO数据集中随机选择1000张图像进行校准

    for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
        cp ${jpg} calibration/; \
    done
    

    备注

    NVIDIA recommends at least 500 images to get a good accuracy. On this example, 1000 images are chosen to get better accuracy (more images = more accuracy). You can set it from head -1000. For example, for 2000 images, head -2000. This process can take a long time.

  6. 创建 calibration.txt 包含所有选定图像的文件

    realpath calibration/*jpg > calibration.txt
    
  7. 设置环境变量

    export INT8_CALIB_IMG_PATH=calibration.txt
    export INT8_CALIB_BATCH_SIZE=1
    

    备注

    INT8_CALIB_BATCH_SIZE 值越高,校准精度越高,校准速度越快。请根据GPU 内存情况进行设置。

  8. 更新 config_infer_primary_yoloV8.txt 文件

    来自

    ...
    model-engine-file=model_b1_gpu0_fp32.engine
    #int8-calib-file=calib.table
    ...
    network-mode=0
    ...
    

    ...
    model-engine-file=model_b1_gpu0_int8.engine
    int8-calib-file=calib.table
    ...
    network-mode=1
    ...
    

运行推理

deepstream-app -c deepstream_app_config.txt

多流设置

要在单个 deepstream 应用程序下设置多个流,您可以对 deepstream_app_config.txt 文件

  1. 更改行和列以根据要拥有的流数构建网格显示。例如,对于 4 个流,我们可以添加 2 行和 2 列。

    [tiled-display]
    rows=2
    columns=2
    
  2. 设置 num-sources=4 并添加 uri 所有 4 个流

    [source0]
    enable=1
    type=3
    uri=<path_to_video>
    uri=<path_to_video>
    uri=<path_to_video>
    uri=<path_to_video>
    num-sources=4
    

运行推理

deepstream-app -c deepstream_app_config.txt
多数据流设置

基准测试结果

下表总结了YOLOv8s 模型在NVIDIA Jetson Orin NX 16GB 上以 640x640 的输入大小在不同TensorRT 精度级别下的性能表现。

型号名称精度推理时间(毫秒/分钟)FPS
YOLOv8sFP3215.6364
FP167.94126
INT85.53181

确认

本指南最初是由我们在 Seeed Studio、Lakshantha 和 Elaine 的朋友创建的。

常见问题

How do I set up Ultralytics YOLO11 on an NVIDIA Jetson device?

To set up Ultralytics YOLO11 on an NVIDIA Jetson device, you first need to install the DeepStream SDK compatible with your JetPack version. Follow the step-by-step guide in our Quick Start Guide to configure your NVIDIA Jetson for YOLO11 deployment.

What is the benefit of using TensorRT with YOLO11 on NVIDIA Jetson?

Using TensorRT with YOLO11 optimizes the model for inference, significantly reducing latency and improving throughput on NVIDIA Jetson devices. TensorRT provides high-performance, low-latency deep learning inference through layer fusion, precision calibration, and kernel auto-tuning. This leads to faster and more efficient execution, particularly useful for real-time applications like video analytics and autonomous machines.

Can I run Ultralytics YOLO11 with DeepStream SDK across different NVIDIA Jetson hardware?

Yes, the guide for deploying Ultralytics YOLO11 with the DeepStream SDK and TensorRT is compatible across the entire NVIDIA Jetson lineup. This includes devices like the Jetson Orin NX 16GB with JetPack 5.1.3 and the Jetson Nano 4GB with JetPack 4.6.4. Refer to the section DeepStream Configuration for YOLO11 for detailed steps.

How can I convert a YOLO11 model to ONNX for DeepStream?

To convert a YOLO11 model to ONNX format for deployment with DeepStream, use the utils/export_yoloV8.py 脚本中的 DeepStream-Yolo 存放处。

下面是一个命令示例:

python3 utils/export_yoloV8.py -w yolov8s.pt --opset 12 --simplify

有关模型转换的更多详情,请查看我们的模型导出部分

What are the performance benchmarks for YOLO on NVIDIA Jetson Orin NX?

The performance of YOLO11 models on NVIDIA Jetson Orin NX 16GB varies based on TensorRT precision levels. For example, YOLOv8s models achieve:

  • FP32 精确度:15.63 ms/im,64 FPS
  • FP16 精确度: 7.94 ms/im,126 FPS
  • INT8 精确度: 5.53 ms/im,181 FPS

These benchmarks underscore the efficiency and capability of using TensorRT-optimized YOLO11 models on NVIDIA Jetson hardware. For further details, see our Benchmark Results section.

📅 Created 3 months ago ✏️ Updated 23 days ago

评论