TensorRT Export for YOLO11 Models
é«æ§èœç°å¢ã§ã³ã³ãã¥ãŒã¿ããžã§ã³ã¢ãã«ãå±éããã«ã¯ãã¹ããŒããšå¹çãæ倧åãããã©ãŒããããå¿ èŠã«ãªãããšããããŸããããã¯ãNVIDIA GPUäžã§ã¢ãã«ãå±éããå Žåã«ç¹ã«åœãŠã¯ãŸããŸãã
By using the TensorRT export format, you can enhance your Ultralytics YOLO11 models for swift and efficient inference on NVIDIA hardware. This guide will give you easy-to-follow steps for the conversion process and help you make the most of NVIDIA's advanced technology in your deep learning projects.
TensorRT
TensorRTNVIDIA ã«ãã£ãŠéçºããããé«éãã£ãŒãã©ãŒãã³ã°æšè«çšã«èšèšãããé«åºŠãªãœãããŠã§ã¢éçºãããïŒSDKïŒã§ããç©äœæ€åºã®ãããªãªã¢ã«ã¿ã€ã ã¢ããªã±ãŒã·ã§ã³ã«é©ããŠããã
ãã®ããŒã«ãããã¯ããã£ãŒãã©ãŒãã³ã°ã¢ãã«ãNVIDIA GPU åãã«æé©åããããé«éã§å¹ççãªãªãã¬ãŒã·ã§ã³ãå®çŸããŸããTensorRT ã¢ãã«ã¯TensorRT æé©åãåããŸãããããã«ã¯ã¬ã€ã€ãŒèåã粟床ãã£ãªãã¬ãŒã·ã§ã³ïŒINT8 ããã³ FP16ïŒãåçtensor ã¡ã¢ãªç®¡çãã«ãŒãã«èªåãã¥ãŒãã³ã°ãªã©ã®æè¡ãå«ãŸããŸãããã£ãŒãã©ãŒãã³ã°ã¢ãã«ãTensorRT 圢åŒã«å€æããããšã§ãéçºè ã¯NVIDIA GPUã®æœåšèœåãååã«çºæ®ããããšãã§ããŸãã
TensorRT ã¯ãTensorFlow ã PyTorchããã³ONNX ãå«ãæ§ã ãªã¢ãã«åœ¢åŒãšã®äºææ§ã§ç¥ãããéçºè ã«ç°ãªããã¬ãŒã ã¯ãŒã¯ã®ã¢ãã«ãçµ±åãæé©åããããã®æè»ãªãœãªã¥ãŒã·ã§ã³ãæäŸããŸãããã®æ±çšæ§ã«ãããå€æ§ãªããŒããŠã§ã¢ããã³ãœãããŠã§ã¢ç°å¢ã§ã®å¹ççãªã¢ãã«å±éãå¯èœã«ãªããŸãã
TensorRT ã¢ãã«ã®äž»ãªç¹åŸŽ
TensorRT ã¢ãã«ã¯ãé«éãã£ãŒãã©ãŒãã³ã°æšè«ã«ãããå¹çæ§ãšæå¹æ§ã«è²¢ç®ãããããŸããŸãªäž»èŠæ©èœãæäŸããïŒ
-
粟å¯ãã£ãªãã¬ãŒã·ã§ã³:TensorRT ã¯ç²Ÿå¯ãã£ãªãã¬ãŒã·ã§ã³ããµããŒãããŠãããç¹å®ã®ç²ŸåºŠèŠä»¶ã«åãããŠã¢ãã«ã埮調æŽããããšãã§ããŸããããã«ã¯INT8ãFP16ã®ãããªäœç²ŸåºŠãã©ãŒãããã®ãµããŒããå«ãŸãã蚱容å¯èœãªç²ŸåºŠã¬ãã«ãç¶æããªããæšè«é床ãããã«åäžãããããšãã§ããŸãã
-
ã¬ã€ã€ãŒèåïŒTensorRT æé©åããã»ã¹ã«ã¯ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®è€æ°ã®ã¬ã€ã€ãŒã1ã€ã®æŒç®ã«çµ±åããã¬ã€ã€ãŒèåãå«ãŸããŸããããã«ããèšç®ãªãŒããŒããããåæžãããã¡ã¢ãªã¢ã¯ã»ã¹ãšèšç®ãæå°åãããããæšè«é床ãåäžããŸãã
-
åçãªTensor ã¡ã¢ãªç®¡çïŒTensorRT æšè«äžã®tensor ã¡ã¢ãªäœ¿çšéãå¹ççã«ç®¡çããã¡ã¢ãªã®ãªãŒããŒããããæžãããã¡ã¢ãªå²ãåœãŠãæé©åããããã®çµæãããå¹ççãªGPU ã¡ã¢ãªå©çšãå¯èœã«ãªãã
-
Automatic Kernel Tuning: TensorRT applies automatic kernel tuning to select the most optimized GPU kernel for each layer of the model. This adaptive approach ensures that the model takes full advantage of the GPUs computational power.
é åãªãã·ã§ã³TensorRT
Before we look at the code for exporting YOLO11 models to the TensorRT format, let's understand where TensorRT models are normally used.
TensorRT ã«ã¯ããã€ãã®å°å ¥ãªãã·ã§ã³ããããåãªãã·ã§ã³ã§çµ±åã®ãããããããã©ãŒãã³ã¹ã®æé©åãæè»æ§ã®ãã©ã³ã¹ãç°ãªã£ãŠããïŒ
- TensorFlow å ã«é 眮ãã: ãã®æ¹æ³ã¯TensorRT ãTensorFlow ã«çµ±åããæé©åãããã¢ãã«ã䜿ãæ £ããTensorFlow ç°å¢ã§å®è¡ã§ããããã«ããŸããTF-TRTã¯ããããå¹ççã«åŠçã§ããããããµããŒããããŠããã¬ã€ã€ãŒãšãµããŒããããŠããªãã¬ã€ã€ãŒãæ··åšããã¢ãã«ã«äŸ¿å©ã§ãã
-
ã¹ã¿ã³ãã¢ãã³TensorRT ã©ã³ã¿ã€ã APIïŒãã现ããå¶åŸ¡ãå¯èœã§ãããã©ãŒãã³ã¹ãéèŠããã¢ããªã±ãŒã·ã§ã³ã«æé©ãããè€éã§ããããµããŒããããŠããªãæŒç®åã®ã«ã¹ã¿ã å®è£ ãå¯èœã§ãã
-
NVIDIA Triton æšè«ãµãŒããŒïŒæ§ã ãªãã¬ãŒã ã¯ãŒã¯ã®ã¢ãã«ããµããŒããããªãã·ã§ã³ãç¹ã«ã¯ã©ãŠãããšããžæšè«ã«é©ããŠãããã¢ãã«ã®åæå®è¡ãã¢ãã«åæãªã©ã®æ©èœãæäŸããã
Exporting YOLO11 Models to TensorRT
You can improve execution efficiency and optimize performance by converting YOLO11 models to TensorRT format.
ã€ã³ã¹ããŒã«
å¿ èŠãªããã±ãŒãžãã€ã³ã¹ããŒã«ããã«ã¯ã以äžãå®è¡ããïŒ
ã€ã³ã¹ããŒã«ããã»ã¹ã«é¢ãã詳ãã説æãšãã¹ããã©ã¯ãã£ã¹ã«ã€ããŠã¯ãYOLO11 ã€ã³ã¹ããŒã«ã¬ã€ããã芧ãã ãããYOLO11 ã«å¿ èŠãªããã±ãŒãžãã€ã³ã¹ããŒã«ããéã«ãäœããã®åé¡ãçºçããå Žåã¯ã解決çããã³ãã«ã€ããŠãããããåé¡ã¬ã€ããåç §ããŠãã ããã
䜿çšæ¹æ³
䜿ãæ¹ã®èª¬æã«å ¥ãåã«ã Ultralytics ãæäŸããYOLO11 ã¢ãã«ã®ã©ã€ã³ããããã確èªãã ãããããã¯ãããªãã®ãããžã§ã¯ãã®èŠä»¶ã«æãé©ããã¢ãã«ãéžæããã®ã«åœ¹ç«ã¡ãŸãã
䜿çšæ¹æ³
from ultralytics import YOLO
# Load the YOLO11 model
model = YOLO("yolo11n.pt")
# Export the model to TensorRT format
model.export(format="engine") # creates 'yolo11n.engine'
# Load the exported TensorRT model
tensorrt_model = YOLO("yolo11n.engine")
# Run inference
results = tensorrt_model("https://ultralytics.com/images/bus.jpg")
ãšã¯ã¹ããŒãããã»ã¹ã®è©³çŽ°ã«ã€ããŠã¯ãUltralytics ããã¥ã¡ã³ãã®ãšã¯ã¹ããŒãã«é¢ããããŒãžãã芧ãã ããã
INT8 éååã«ããTensorRT ã®ãšã¯ã¹ããŒã
INT8粟床㮠TensorRT ã䜿çšããŠUltralytics YOLO ã¢ãã«ããšã¯ã¹ããŒããããšããã¹ããã¬ãŒãã³ã°éååïŒPTQïŒãå®è¡ãããŸããTensorRT ã¯PTQã®ããã«ãã£ãªãã¬ãŒã·ã§ã³ã䜿çšããŸããããã¯ãYOLO ã¢ãã«ã代衚çãªå ¥åããŒã¿ã«å¯ŸããŠæšè«ãåŠçããéã«ãåã¢ã¯ãã£ããŒã·ã§ã³tensor å ã®ã¢ã¯ãã£ããŒã·ã§ã³ã®ååžã枬å®ãããã®ååžã䜿çšããŠåtensor ã®ã¹ã±ãŒã«å€ãæšå®ããŸããéååã®åè£ãšãªãå掻æ§åtensor ã¯ããã£ãªãã¬ãŒã·ã§ã³ããã»ã¹ã«ãã£ãŠæšæž¬ãããé¢é£ã¹ã±ãŒã«ãæã€ã
æé»çã«éååããããããã¯ãŒã¯ãåŠçãããšããTensorRT ã¯ã¬ã€ã€ãŒã®å®è¡æéãæé©åããããã«INT8ãèšæ©å¿å€ã«äœ¿çšãããããã¬ã€ã€ãINT8ã§é«éã«åäœãããã®ããŒã¿å ¥åºåã«éååã¹ã±ãŒã«ãå²ãåœãŠãããŠããå ŽåãINT8粟床ã®ã«ãŒãã«ããã®ã¬ã€ã€ã«å²ãåœãŠããããããã§ãªãå ŽåãTensorRT ããã®ã¬ã€ã€ã®å®è¡æéãéããªãæ¹ã«åºã¥ããŠãã«ãŒãã«ã«FP32ãŸãã¯FP16ã®ããããã®ç²ŸåºŠãéžæããã
ããã
èŒæ£çµæã¯ããã€ã¹ã«ãã£ãŠç°ãªãå¯èœæ§ããããããTensorRT ã¢ãã«ãŠã§ã€ããé åã«äœ¿çšããããã€ã¹ãšåããã®ã INT8 粟床ã§ãšã¯ã¹ããŒãããããã«äœ¿çšããããšãéèŠã§ãã
INT8ãšã¯ã¹ããŒãã®èšå®
ã䜿çšããéã«æäŸãããåŒæ°ã§ãã èŒžåº Ultralytics YOLO ã¢ãã«ã®å Žå 倧ãã« ã¯ãšã¯ã¹ããŒããããã¢ãã«ã®ããã©ãŒãã³ã¹ã«åœ±é¿ããŸãããŸããå©çšå¯èœãªããã€ã¹ãªãœãŒã¹ã«åºã¥ããŠéžæããå¿
èŠããããŸãããããã©ã«ãã®åŒæ° ã¹ãã§ãã ã»ãšãã©ã®å Žå Ampere (ãŸãã¯æ°ãã)NVIDIA ãã£ã¹ã¯ãªãŒãGPU.䜿çšãããæ ¡æ£ã¢ã«ãŽãªãºã 㯠"ENTROPY_CALIBRATION_2"
ãªãã·ã§ã³ã®è©³çŽ°ã«ã€ããŠã¯ããã¡ããã芧ãã ããã TensorRT éçºè
ã¬ã€ãUltralytics ãã¹ãã«ãããš "ENTROPY_CALIBRATION_2"
ãæè¯ã®éžæã§ããã茞åºã¯ãã®ã¢ã«ãŽãªãºã ã䜿ãããšã確å®ããŠããã
-
workspace
:ã¢ãã«ã®éã¿ãå€æããéã®ããã€ã¹ã¡ã¢ãªã®å²ãåœãŠãµã€ãºïŒGiBåäœïŒãå¶åŸ¡ããŸãã-
ã調æŽããã
workspace
æ ¡æ£ã®ããŒãºãšãªãœãŒã¹ã®å©çšå¯èœæ§ã«å¿ããŠãå€ãèšå®ããŠãã ããããã倧ããªworkspace
ãã£ãªãã¬ãŒã·ã§ã³ã«ãããæéã¯é·ããªããããããªãããTensorRT ãããå¹ åºãæé©åæŠè¡ãæ€èšããããšãã§ããããã«ãªããã¢ãã«ã®ããã©ãŒãã³ã¹ãåäžãããå¯èœæ§ãããã 粟床.éã«workspace
ã¯èŒæ£æéãççž®ã§ããããæé©åæŠç¥ãå¶éãããéååã¢ãã«ã®å質ã«åœ±é¿ãäžããå¯èœæ§ãããã -
ããã©ã«ãã¯
workspace=None
TensorRT æåã§èšå®ããå Žåããã£ãªãã¬ãŒã·ã§ã³ãã¯ã©ãã·ã¥ïŒèŠåãªãã«çµäºïŒããå Žåã¯ããã®å€ãå¢ããå¿ èŠããããããããŸããã -
TensorRT å ±åãã
UNSUPPORTED_STATE
ã®å€ãworkspace
ã®å€ã¯ãããã€ã¹ãå©çšå¯èœãªã¡ã¢ãªããã倧ãããworkspace
ã«èšå®ããå¿ èŠããããNone
. -
ãã
workspace
ãæ倧å€ã«èšå®ãããèŒæ£ã倱æ/ã¯ã©ãã·ã¥ããå Žåã¯ã以äžã®äœ¿çšãæ€èšããŠãã ãããNone
ã®å€ãå°ããããããšã§ãèªåã¢ãã±ãŒã·ã§ã³ãå¯èœã«ãªããimgsz
ãããŠbatch
ã䜿çšããããšã§ãå¿ èŠãªã¡ã¢ãªãåæžããããšãã§ããã -
INT8ã®ãã£ãªãã¬ãŒã·ã§ã³ã¯åããã€ã¹ã«åºæã§ããããšãå¿ããªãã§ãã ããããã£ãªãã¬ãŒã·ã§ã³ã®ããã« "ãã€ãšã³ã"GPU ãåçšãããšãæšè«ãä»ã®ããã€ã¹ã§å®è¡ãããšãã«ããã©ãŒãã³ã¹ãäœäžããå¯èœæ§ããããŸãã
-
-
batch
:æšè«ã«äœ¿çšãããæ倧ããããµã€ãºãæšè«äžã¯ããå°ããªãããã䜿çšããããšãã§ããããæšè«ã¯æå®ãããããããã倧ããªãããã¯åãä»ããªãã
泚
ãã£ãªãã¬ãŒã·ã§ã³äžã2å batch
æäŸããããµã€ãºã䜿çšããããå°ãããã®äœ¿çšã¯ãèŒæ£æã«äžæ£ç¢ºãªã¹ã±ãŒãªã³ã°ã«ã€ãªããå¯èœæ§ããããŸããããã¯ãããã»ã¹ãèŠãããŒã¿ã«åºã¥ããŠèª¿æŽããããã§ããå°ãããã®å Žåãå€ã®å
šç¯å²ãææ¡ã§ããªãå¯èœæ§ããããæçµçãªèŒæ£ã«åé¡ãçãããã batch
ãµã€ãºã¯èªåçã«2åã«ãªãããã ããããµã€ãº ãæå®ãããŠãã batch=1
æ ¡æ£ã¯ batch=1 * 2
èŒæ£ã®ã¹ã±ãŒãªã³ã°èª€å·®ãæžããã
NVIDIA ã®å®éšã«ãããšãINT8éååãã£ãªãã¬ãŒã·ã§ã³ã§ã¯ãã¢ãã«ã®ããŒã¿ã代衚ãããã£ãªãã¬ãŒã·ã§ã³ç»åãå°ãªããšã500æ䜿çšããããšãæšå¥šããŠããŸããããã¯ã¬ã€ãã©ã€ã³ã§ãã ããŒã ãš ããªãã®ããŒã¿ã»ããã§äœãå¿
èŠããè©ŠããŠã¿ãå¿
èŠãããã TensorRT ãINT8ã®èŒæ£ã«ã¯èŒæ£ããŒã¿ãå¿
èŠã§ãã®ã§ãå¿
ã data
åŒæ° int8=True
TensorRT ã䜿çšããã data="my_dataset.yaml"
ã®ç»åã䜿çšããã ããªããŒã·ã§ã³ ã§æ ¡æ£ãããã«å€ãæž¡ãããªãå Žå data
INT8éååã§TensorRT ã ã¢ãã«ã¿ã¹ã¯ã«åºã¥ã "å°ã㪠"ãµã³ãã«ããŒã¿ã»ãã ãšã©ãŒãã¹ããŒããªãã
äŸ
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
model.export(
format="engine",
dynamic=True, # (1)!
batch=8, # (2)!
workspace=4, # (3)!
int8=True,
data="coco.yaml", # (4)!
)
# Load the exported TensorRT INT8 model
model = YOLO("yolov8n.engine", task="detect")
# Run inference
result = model.predict("https://ultralytics.com/images/bus.jpg")
- ã§ãšã¯ã¹ããŒãããå Žåãããã©ã«ãã§æå¹ã«ãªããŸãã
int8=True
æ瀺çã«èšå®ãããŠããªãå Žåã§ãåç § 茞åºåŒæ° ãã芧ãã ããã - ã§èŒæ£ãããšã¯ã¹ããŒãã¢ãã«ã®æ倧ããããµã€ãºã 8 ã«èšå®ããŸãã
batch = 2 * 8
æ ¡æ£æã®ã¹ã±ãŒãªã³ã°ãšã©ãŒãé¿ããããã - å€æåŠçã®ããã«ããã€ã¹å šäœãå²ãåœãŠã代ããã«ã4GiBã®ã¡ã¢ãªãå²ãåœãŠãã
- COCOããŒã¿ã»ãããæ ¡æ£ã«äœ¿çšãç¹ã«æ€èšŒã«äœ¿çšããç»åïŒåèš5,000æïŒã
# Export a YOLO11n PyTorch model to TensorRT format with INT8 quantization
yolo export model=yolo11n.pt format=engine batch=8 workspace=4 int8=True data=coco.yaml # creates 'yolov8n.engine''
# Run inference with the exported TensorRT quantized model
yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'
ãã£ãªãã¬ãŒã·ã§ã³ã»ãã£ãã·ã¥
TensorRT ãçæããŸãã .cache
ããã¯ãåãããŒã¿ã䜿ã£ãå°æ¥ã®ã¢ãã«éã¿ã®ãšã¯ã¹ããŒããé«éåããããã«åå©çšããããšãã§ããŸãããããŒã¿ã倧ããç°ãªãå Žåãã次ã®ãããªå Žåã¯ããã£ãªãã¬ãŒã·ã§ã³ãããŸããããªãå¯èœæ§ããããŸãã batch
å€ã倧å¹
ã«å€æŽãããããã®ãããªç¶æ³ã§ã¯ãæ¢åã® .cache
ã¯ååãå€ããŠå¥ã®ãã£ã¬ã¯ããªã«ç§»åããããå®å
šã«åé€ããå¿
èŠãããã
TensorRT INT8 ã§YOLO ã䜿çšããå©ç¹
-
ã¢ãã«ãµã€ãºã®çž®å°ïŒFP32ããINT8ãžã®éååã«ãããïŒãã£ã¹ã¯äžãŸãã¯ã¡ã¢ãªäžã®ïŒã¢ãã«ãµã€ãºã4åã®1ã«ãªããããŠã³ããŒãæéãççž®ãããŸãã
-
æ¶è²»é»åã®äœæžïŒINT8ã§ãšã¯ã¹ããŒããããYOLO ã¢ãã«ã®ç²ŸåºŠãäžããæŒç®ã¯ãFP32ã¢ãã«ã«æ¯ã¹ãŠæ¶è²»é»åãæããããšãã§ããç¹ã«ããããªãŒé§åã®ããã€ã¹ã«é©ããŠããŸãã
-
æšè«é床ã®åäžïŒ TensorRT ã¯ãã¿ãŒã²ãããšãªãããŒããŠã§ã¢ã«åãããŠã¢ãã«ãæé©åãããããGPUãçµã¿èŸŒã¿ããã€ã¹ãã¢ã¯ã»ã©ã¬ãŒã¿ã§ã®æšè«é床ãåäžããå¯èœæ§ããããŸãã
æšè«é床ã«é¢ããã¡ã¢
TensorRT INT8 ã«ãšã¯ã¹ããŒããããã¢ãã«ã䜿ã£ãæåã®æ°åã®æšè«åŒã³åºãã¯ãååŠçãæšè«ãåŸåŠçã«éåžžããé·ãæéããããããšãäºæ³ãããŸãããã㯠imgsz
æšè«äžãç¹ã« imgsz
ã¯ããšã¯ã¹ããŒãæã«æå®ããããã®ãšåãã§ã¯ãããŸããïŒãšã¯ã¹ããŒã imgsz
ã¯TensorRT ãæé©ããããã¡ã€ã«ãšããŠèšå®ãããŠããïŒã
TensorRT INT8ã§YOLO ã
-
è©äŸ¡ææšã®äœäžïŒ ããäœã粟床ã䜿çšãããš
mAP
,Precision
,Recall
ãŸã㯠ã¢ãã«ã®æ§èœãè©äŸ¡ããããã«äœ¿çšããããã®ä»ã®ææš ã¯å€å°æªããªãå¯èœæ§ãé«ãã詳现㯠ããã©ãŒãã³ã¹çµæã»ã¯ã·ã§ã³ ã®éããæ¯èŒãããmAP50
ãããŠmAP50-95
æ§ã ãªããã€ã¹ã®å°ããªãµã³ãã«ã§INT8ã§ãšã¯ã¹ããŒãããå Žåã -
éçºæéã®å¢å ïŒããŒã¿ã»ãããšããã€ã¹ã«å¿ããINT8èŒæ£ã®ãæé©ãèšå®ãèŠã€ããã«ã¯ãããªãã®éã®ãã¹ããå¿ èŠã§ãã
-
ããŒããŠã§ã¢äŸåæ§ïŒãã£ãªãã¬ãŒã·ã§ã³ãæ§èœåäžã¯ããŒããŠã§ã¢ã«å€§ããäŸåããå¯èœæ§ããããã¢ãã«ã®éã¿ä»ãã¯ç§»æ€æ§ãäœãã
Ultralytics YOLO TensorRT 茞åºå®çžŸ
NVIDIA A100
ããã©ãŒãã³ã¹
Ubuntu 22.04.3 LTSã§ãã¹ãã python 3.10.12
, ultralytics==8.2.4
, tensorrt==8.6.1.post1
80ã®èšç·Žæžã¿ã¯ã©ã¹ãå«ãCOCOäžã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãDetection Docsãåç §ããŠãã ããã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.52 | 0.51 | 0.56 | 8 | 640 | ||
FP32 | COCOval | 0.52 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.34 | 0.34 | 0.41 | 8 | 640 | ||
FP16 | COCOval | 0.33 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.28 | 0.27 | 0.31 | 8 | 640 | ||
INT8 | COCOval | 0.29 | 0.47 | 0.33 | 1 | 640 |
COCOã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãSegmentation Docsãåç §ããŠãã ããã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n-seg.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
mAPval 50(M) |
mAPval 50-95(M) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.62 | 0.61 | 0.68 | 8 | 640 | ||||
FP32 | COCOval | 0.63 | 0.52 | 0.36 | 0.49 | 0.31 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.40 | 0.39 | 0.44 | 8 | 640 | ||||
FP16 | COCOval | 0.43 | 0.52 | 0.36 | 0.49 | 0.30 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.34 | 0.33 | 0.37 | 8 | 640 | ||||
INT8 | COCOval | 0.36 | 0.46 | 0.32 | 0.43 | 0.27 | 1 | 640 |
ImageNetã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãClassification Docsãåç §ããŠãã ããã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n-cls.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
ããã | ããã5 | batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.26 | 0.25 | 0.28 | 8 | 640 | ||
FP32 | ã€ã¡ãŒãžããããã« | 0.26 | 0.35 | 0.61 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.18 | 0.17 | 0.19 | 8 | 640 | ||
FP16 | ã€ã¡ãŒãžããããã« | 0.18 | 0.35 | 0.61 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.16 | 0.15 | 0.57 | 8 | 640 | ||
INT8 | ã€ã¡ãŒãžããããã« | 0.15 | 0.32 | 0.59 | 1 | 640 |
COCOã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãPose Estimation Docsãåç §ããŠãã ããã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n-pose.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
mAPval 50(P) |
mAPval 50-95(P) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.54 | 0.53 | 0.58 | 8 | 640 | ||||
FP32 | COCOval | 0.55 | 0.91 | 0.69 | 0.80 | 0.51 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.37 | 0.35 | 0.41 | 8 | 640 | ||||
FP16 | COCOval | 0.36 | 0.91 | 0.69 | 0.80 | 0.51 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.29 | 0.28 | 0.33 | 8 | 640 | ||||
INT8 | COCOval | 0.30 | 0.90 | 0.68 | 0.78 | 0.47 | 1 | 640 |
DOTAv1ã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãOriented Detection Docsãåç §ã®ããšã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n-obb.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.52 | 0.51 | 0.59 | 8 | 640 | ||
FP32 | DOTAv1val | 0.76 | 0.50 | 0.36 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.34 | 0.33 | 0.42 | 8 | 640 | ||
FP16 | DOTAv1val | 0.59 | 0.50 | 0.36 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.29 | 0.28 | 0.33 | 8 | 640 | ||
INT8 | DOTAv1val | 0.32 | 0.45 | 0.32 | 1 | 640 |
ã³ã³ã·ã¥ãŒããŒåãGPU
æ€åºæ§èœïŒCOCOïŒ
Windows 10.0.19045ã§ãã¹ãã python 3.10.9
, ultralytics==8.2.4
, tensorrt==10.0.0b6
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 1.06 | 0.75 | 1.88 | 8 | 640 | ||
FP32 | COCOval | 1.37 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.62 | 0.75 | 1.13 | 8 | 640 | ||
FP16 | COCOval | 0.85 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.52 | 0.38 | 1.00 | 8 | 640 | ||
INT8 | COCOval | 0.74 | 0.47 | 0.33 | 1 | 640 |
Windows 10.0.22631ã§ãã¹ãã python 3.11.9
, ultralytics==8.2.4
, tensorrt==10.0.1
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 1.76 | 1.69 | 1.87 | 8 | 640 | ||
FP32 | COCOval | 1.94 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.86 | 0.75 | 1.00 | 8 | 640 | ||
FP16 | COCOval | 1.43 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.80 | 0.75 | 1.00 | 8 | 640 | ||
INT8 | COCOval | 1.35 | 0.47 | 0.33 | 1 | 640 |
Pop!_OS 22.04 LTSã§ãã¹ãã python 3.10.12
, ultralytics==8.2.4
, tensorrt==8.6.1.post1
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 2.84 | 2.84 | 2.85 | 8 | 640 | ||
FP32 | COCOval | 2.94 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 1.09 | 1.09 | 1.10 | 8 | 640 | ||
FP16 | COCOval | 1.20 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.75 | 0.74 | 0.75 | 8 | 640 | ||
INT8 | COCOval | 0.76 | 0.47 | 0.33 | 1 | 640 |
çµã¿èŸŒã¿æ©åš
æ€åºæ§èœïŒCOCOïŒ
JetPack 6.0 (L4T 36.3) Ubuntu 22.04.4 LTSã§ãã¹ãã python 3.10.12
, ultralytics==8.2.16
, tensorrt==10.0.1
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) åãã¹ãã§ãäºåã«èšç·Žãããéã¿ã䜿çšããã yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 6.11 | 6.10 | 6.29 | 8 | 640 | ||
FP32 | COCOval | 6.17 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 3.18 | 3.18 | 3.20 | 8 | 640 | ||
FP16 | COCOval | 3.19 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 2.30 | 2.29 | 2.35 | 8 | 640 | ||
INT8 | COCOval | 2.32 | 0.46 | 0.32 | 1 | 640 |
ã€ã³ãã©ã¡ãŒã·ã§ã³
NVIDIA JetsonwithUltralytics YOLO ã®ã¯ã€ãã¯ã¹ã¿ãŒãã¬ã€ãã§ãã»ããã¢ãããšèšå®ã®è©³çŽ°ãã芧ãã ããã
è©äŸ¡æ¹æ³
ãããã®ã¢ãã«ãã©ã®ããã«ãšã¯ã¹ããŒãããããã¹ãããããã«ã€ããŠã¯ã以äžã®ã»ã¯ã·ã§ã³ãåç §ããŠãã ããã
ãšã¯ã¹ããŒãèšå®
ãšã¯ã¹ããŒãèšå®åŒæ°ã®è©³çŽ°ã«ã€ããŠã¯ããšã¯ã¹ããŒãã»ã¢ãŒããåç §ããŠãã ããã
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
# TensorRT FP32
out = model.export(format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2)
# TensorRT FP16
out = model.export(format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2, half=True)
# TensorRT INT8 with calibration `data` (i.e. COCO, ImageNet, or DOTAv1 for appropriate model task)
out = model.export(
format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2, int8=True, data="coco8.yaml"
)
äºæž¬ã«ãŒã
詳现ã¯äºæž¬ã¢ãŒããåç §ã
ããªããŒã·ã§ã³èšå®
åç
§ val
ã¢ãŒã ãåç
§ããŠãã ããã
Deploying Exported YOLO11 TensorRT Models
Having successfully exported your Ultralytics YOLO11 models to TensorRT format, you're now ready to deploy them. For in-depth instructions on deploying your TensorRT models in various settings, take a look at the following resources:
-
Triton ãµãŒããŒã§Ultralytics ãå±éãã:NVIDIA ã®Triton Inference (æ§TensorRT Inference) Server ããç¹ã«Ultralytics YOLO ã¢ãã«ã§äœ¿çšããæ¹æ³ã«ã€ããŠã®ã¬ã€ãã§ãã
-
ãã£ãŒãã»ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ã®å±éNVIDIA TensorRT:ãã®èšäºã§ã¯ãNVIDIA TensorRT ã䜿ã£ãŠãGPU- ããŒã¹ã®ãããã€ã¡ã³ãã»ãã©ãããã©ãŒã ã«ãã£ãŒãã»ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ãå¹ççã«ãããã€ããæ¹æ³ã説æããã
-
NVIDIA ããŒã¹PCã®ããã®ãšã³ãã»ããŒã»ãšã³ãAIïŒNVIDIA TensorRT å±é:ãã®ããã°èšäºã§ã¯ãNVIDIA TensorRT ã䜿çšããŠãNVIDIA- ããŒã¹ã®PCäžã§AIã¢ãã«ãæé©åããŠå±éããæ¹æ³ã«ã€ããŠèª¬æããŸãã
-
GitHub Repository forNVIDIA TensorRT ïŒ:NVIDIA TensorRT ã®ãœãŒã¹ã³ãŒããšããã¥ã¡ã³ããå«ãå ¬åŒ GitHub ãªããžããªã§ãã
æŠèŠ
In this guide, we focused on converting Ultralytics YOLO11 models to NVIDIA's TensorRT model format. This conversion step is crucial for improving the efficiency and speed of YOLO11 models, making them more effective and suitable for diverse deployment environments.
䜿ãæ¹ã®è©³çŽ°ã«ã€ããŠã¯ãTensorRT ã®å ¬åŒããã¥ã¡ã³ããã芧ãã ããã
If you're curious about additional Ultralytics YOLO11 integrations, our integration guide page provides an extensive selection of informative resources and insights.
ããããã質å
How do I convert YOLO11 models to TensorRT format?
To convert your Ultralytics YOLO11 models to TensorRT format for optimized NVIDIA GPU inference, follow these steps:
-
å¿ èŠãªããã±ãŒãžãã€ã³ã¹ããŒã«ããïŒ
-
Export your YOLO11 model:
For more details, visit the YOLO11 Installation guide and the export documentation.
What are the benefits of using TensorRT for YOLO11 models?
Using TensorRT to optimize YOLO11 models offers several benefits:
- æšè«é床ã®é«éåïŒTensorRT ã¯ã¢ãã«ã¬ã€ã€ãŒãæé©åããé«ç²ŸåºŠãã£ãªãã¬ãŒã·ã§ã³ïŒINT8ãšFP16ïŒã䜿çšããããšã§ã粟床ãå€§å¹ ã«ç ç²ã«ããããšãªãæšè«é床ãé«éåããŸãã
- ã¡ã¢ãªå¹çïŒTensorRT ã¯tensor ã®ã¡ã¢ãªãåçã«ç®¡çãããªãŒããŒããããåæžããGPU ã®ã¡ã¢ãªäœ¿çšçãåäžãããã
- ã¬ã€ã€ãŒèåïŒè€æ°ã®ã¬ã€ã€ãŒã1ã€ã®æŒç®ã«çµ±åããèšç®ã®è€éãã軜æžããã
- ã«ãŒãã«ã®èªåãã¥ãŒãã³ã°ïŒåã¢ãã«ã¬ã€ã€ãŒã«æé©åãããGPU ã«ãŒãã«ãèªåçã«éžæããæé«ã®ããã©ãŒãã³ã¹ãä¿èšŒããŸãã
詳ããã¯ãTensorRT ã®è©³çŽ°æ©èœãã芧ãã ããããŸããTensorRT ã®æŠèŠã»ã¯ã·ã§ã³ãã芧ãã ããã
Can I use INT8 quantization with TensorRT for YOLO11 models?
Yes, you can export YOLO11 models using TensorRT with INT8 quantization. This process involves post-training quantization (PTQ) and calibration:
-
INT8ã§ãšã¯ã¹ããŒãïŒ
-
æšè«ãå®è¡ããïŒ
詳现ã«ã€ããŠã¯ãINT8éååã§TensorRT ã
How do I deploy YOLO11 TensorRT models on an NVIDIA Triton Inference Server?
Deploying YOLO11 TensorRT models on an NVIDIA Triton Inference Server can be done using the following resources:
- Triton ãµãŒãã§Ultralytics YOLOv8 ãå±éããã:Triton Inference Server ã®ã»ããã¢ãããšäœ¿çšã«é¢ããã¹ããããã€ã¹ãããã®ã¬ã€ãã³ã¹ã
- NVIDIA Triton æšè«ãµãŒãã»ããã¥ã¡ã³ã:NVIDIA ã®å ¬åŒããã¥ã¡ã³ãã§ã詳现ãªå°å ¥ãªãã·ã§ã³ãšèšå®ãã芧ããã ããŸãã
ãããã®ã¬ã€ãã¯ãæ§ã ãªé åç°å¢ã«ãããŠYOLOv8 ã¢ãã«ãå¹ççã«çµ±åããã®ã«åœ¹ç«ã¡ãŸãã
TensorRT ã«ãšã¯ã¹ããŒããããYOLOv8 ã¢ãã«ã§ç¢ºèªãããããã©ãŒãã³ã¹ã®åäžãšã¯ïŒ
TensorRT ã«ããããã©ãŒãã³ã¹ã®åäžã¯ã䜿çšããããŒããŠã§ã¢ã«ãã£ãŠç°ãªããŸãã以äžã¯ä»£è¡šçãªãã³ãããŒã¯ã§ãïŒ
-
NVIDIA A100:
- FP32æšè«ïŒ~0.52ããªç§/ç»å
- FP16æšè«ïŒ~0.34ããªç§/ç»å
- INT8æšè«ïŒ~0.28ããªç§/ç»å
- INT8粟床ã§ã¯mAPããããã«æžå°ããŠããããã¹ããŒãã¯å€§å¹ ã«åäžããŠããã
-
ã³ã³ã·ã¥ãŒããŒåãGPUïŒäŸïŒRTX 3080ïŒïŒ
- FP32æšè«ïŒ~1.06ããªç§/ç»å
- FP16æšè«ïŒ~0.62ããªç§/ç»å
- INT8æšè«ïŒ~0.52ããªç§/ç»å
ããŸããŸãªããŒããŠã§ã¢æ§æã®è©³çŽ°ãªæ§èœãã³ãããŒã¯ã¯ãæ§èœã®ã»ã¯ã·ã§ã³ã§èŠãããšãã§ããã
TensorRT ã®ããã©ãŒãã³ã¹ã«é¢ããããå æ¬çãªæŽå¯ã«ã€ããŠã¯ãUltralytics ã®ããã¥ã¡ã³ãããã³ããã©ãŒãã³ã¹åæã¬ããŒããåç §ããŠãã ããã