Ultralytics YOLO11 on NVIDIA Jetson using DeepStream SDK and TensorRT
Watch: How to Run Multiple Streams with DeepStream SDK on Jetson Nano using Ultralytics YOLO11
This comprehensive guide provides a detailed walkthrough for deploying Ultralytics YOLO11 on NVIDIA Jetson devices using DeepStream SDK and TensorRT. Here we use TensorRT to maximize the inference performance on the Jetson platform.
Note
This guide has been tested with NVIDIA Jetson Orin Nano Super Developer Kit running the latest stable JetPack release of JP6.1, Seeed Studio reComputer J4012 which is based on NVIDIA Jetson Orin NX 16GB running JetPack release of JP5.1.3 and Seeed Studio reComputer J1020 v2 which is based on NVIDIA Jetson Nano 4GB running JetPack release of JP4.6.4. It is expected to work across all the NVIDIA Jetson hardware lineup including latest and legacy.
What is NVIDIA DeepStream?
NVIDIA's DeepStream SDK is a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio, and image understanding. It's ideal for vision AI developers, software partners, startups, and OEMs building IVA (Intelligent Video Analytics) apps and services. You can now create stream-processing pipelines that incorporate neural networks and other complex processing tasks like tracking, video encoding/decoding, and video rendering. These pipelines enable real-time analytics on video, image, and sensor data. DeepStream's multi-platform support gives you a faster, easier way to develop vision AI applications and services on-premise, at the edge, and in the cloud.
Prerequisites
Before you start to follow this guide:
- Visit our documentation, Quick Start Guide: NVIDIA Jetson with Ultralytics YOLO11 to set up your NVIDIA Jetson device with Ultralytics YOLO11
-
Install DeepStream SDK according to the JetPack version
- For JetPack 4.6.4, install DeepStream 6.0.1
- For JetPack 5.1.3, install DeepStream 6.3
- For JetPack 6.1, install DeepStream 7.1
Tip
In this guide we have used the Debian package method of installing DeepStream SDK to the Jetson device. You can also visit the DeepStream SDK on Jetson (Archived) to access legacy versions of DeepStream.
DeepStream Configuration for YOLO11
Here we are using marcoslucianops/DeepStream-Yolo GitHub repository which includes NVIDIA DeepStream SDK support for YOLO models. We appreciate the efforts of marcoslucianops for his contributions!
-
Install Ultralytics with necessary dependencies
-
Clone the DeepStream-Yolo repository
-
Copy the
export_yoloV8.py
file fromDeepStream-Yolo/utils
directory to theultralytics
folderNote
export_yoloV8.py
works for both YOLOv8 and YOLO11 models. -
Download Ultralytics YOLO11 detection model (.pt) of your choice from YOLO11 releases. Here we use yolo11s.pt.
Note
You can also use a custom trained YOLO11 model.
-
Convert model to ONNX
Pass the below arguments to the above command
For DeepStream 6.0.1, use opset 12 or lower. The default opset is 16.
To change the inference size (default: 640)
Example for 1280:
To simplify the ONNX model (DeepStream >= 6.0)
To use dynamic batch-size (DeepStream >= 6.1)
To use static batch-size (example for batch-size = 4)
-
Copy the generated
.onnx
model file andlabels.txt
file to theDeepStream-Yolo
folder -
Set the CUDA version according to the JetPack version installed
For JetPack 4.6.4:
For JetPack 5.1.3:
For Jetpack 6.1:
-
Compile the library
-
Edit the
config_infer_primary_yoloV8.txt
file according to your model (for YOLO11s with 80 classes) -
Edit the
deepstream_app_config
file -
You can also change the video source in
deepstream_app_config
file. Here a default video file is loaded
Run Inference
Note
It will take a long time to generate the TensorRT engine file before starting the inference. So please be patient.
Tip
If you want to convert the model to FP16 precision, simply set model-engine-file=model_b1_gpu0_fp16.engine
and network-mode=2
inside config_infer_primary_yoloV8.txt
INT8 Calibration
If you want to use INT8 precision for inference, you need to follow the steps below
Note
Currently INT8 does not work with TensorRT 10.x. This section of the guide has been tested with TensorRT 8.x which is expected to work.
-
Set
OPENCV
environment variable -
Compile the library
-
For COCO dataset, download the val2017, extract, and move to
DeepStream-Yolo
folder -
Make a new directory for calibration images
-
Run the following to select 1000 random images from COCO dataset to run calibration
Note
NVIDIA recommends at least 500 images to get a good accuracy. On this example, 1000 images are chosen to get better accuracy (more images = more accuracy). You can set it from head -1000. For example, for 2000 images, head -2000. This process can take a long time.
-
Create the
calibration.txt
file with all selected images -
Set environment variables
Note
Higher INT8_CALIB_BATCH_SIZE values will result in more accuracy and faster calibration speed. Set it according to you GPU memory.
-
Update the
config_infer_primary_yoloV8.txt
fileFrom
To
Run Inference
MultiStream Setup
To set up multiple streams under a single deepstream application, you can do the following changes to the deepstream_app_config.txt
file
-
Change the rows and columns to build a grid display according to the number of streams you want to have. For example, for 4 streams, we can add 2 rows and 2 columns.
-
Set
num-sources=4
and adduri
of all the 4 streams
Run Inference
Benchmark Results
The following benchmarks summarizes how YOLO11 models perform at different TensorRT precision levels with an input size of 640x640 on NVIDIA Jetson Orin NX 16GB.
Comparison Chart
Detailed Comparison Table
Performance
Format | Status | Inference time (ms/im) |
---|---|---|
TensorRT (FP32) | ✅ | 8.64 |
TensorRT (FP16) | ✅ | 5.27 |
TensorRT (INT8) | ✅ | 4.54 |
Format | Status | Inference time (ms/im) |
---|---|---|
TensorRT (FP32) | ✅ | 14.53 |
TensorRT (FP16) | ✅ | 7.91 |
TensorRT (INT8) | ✅ | 6.05 |
Format | Status | Inference time (ms/im) |
---|---|---|
TensorRT (FP32) | ✅ | 32.05 |
TensorRT (FP16) | ✅ | 15.55 |
TensorRT (INT8) | ✅ | 10.43 |
Format | Status | Inference time (ms/im) |
---|---|---|
TensorRT (FP32) | ✅ | 39.68 |
TensorRT (FP16) | ✅ | 19.88 |
TensorRT (INT8) | ✅ | 13.64 |
Format | Status | Inference time (ms/im) |
---|---|---|
TensorRT (FP32) | ✅ | 80.65 |
TensorRT (FP16) | ✅ | 39.06 |
TensorRT (INT8) | ✅ | 22.83 |
Acknowledgements
This guide was initially created by our friends at Seeed Studio, Lakshantha and Elaine.
FAQ
How do I set up Ultralytics YOLO11 on an NVIDIA Jetson device?
To set up Ultralytics YOLO11 on an NVIDIA Jetson device, you first need to install the DeepStream SDK compatible with your JetPack version. Follow the step-by-step guide in our Quick Start Guide to configure your NVIDIA Jetson for YOLO11 deployment.
What is the benefit of using TensorRT with YOLO11 on NVIDIA Jetson?
Using TensorRT with YOLO11 optimizes the model for inference, significantly reducing latency and improving throughput on NVIDIA Jetson devices. TensorRT provides high-performance, low-latency deep learning inference through layer fusion, precision calibration, and kernel auto-tuning. This leads to faster and more efficient execution, particularly useful for real-time applications like video analytics and autonomous machines.
Can I run Ultralytics YOLO11 with DeepStream SDK across different NVIDIA Jetson hardware?
Yes, the guide for deploying Ultralytics YOLO11 with the DeepStream SDK and TensorRT is compatible across the entire NVIDIA Jetson lineup. This includes devices like the Jetson Orin NX 16GB with JetPack 5.1.3 and the Jetson Nano 4GB with JetPack 4.6.4. Refer to the section DeepStream Configuration for YOLO11 for detailed steps.
How can I convert a YOLO11 model to ONNX for DeepStream?
To convert a YOLO11 model to ONNX format for deployment with DeepStream, use the utils/export_yoloV8.py
script from the DeepStream-Yolo repository.
Here's an example command:
For more details on model conversion, check out our model export section.
What are the performance benchmarks for YOLO on NVIDIA Jetson Orin NX?
The performance of YOLO11 models on NVIDIA Jetson Orin NX 16GB varies based on TensorRT precision levels. For example, YOLO11s models achieve:
- FP32 Precision: 14.6 ms/im, 68.5 FPS
- FP16 Precision: 7.94 ms/im, 126 FPS
- INT8 Precision: 5.95 ms/im, 168 FPS
These benchmarks underscore the efficiency and capability of using TensorRT-optimized YOLO11 models on NVIDIA Jetson hardware. For further details, see our Benchmark Results section.