TensorRT YOLOv8 ã¢ãã«ã®ãšã¯ã¹ããŒã
é«æ§èœç°å¢ã§ã³ã³ãã¥ãŒã¿ããžã§ã³ã¢ãã«ãå±éããã«ã¯ãã¹ããŒããšå¹çãæ倧åãããã©ãŒããããå¿ èŠã«ãªãããšããããŸããããã¯ãNVIDIA GPUäžã§ã¢ãã«ãå±éããå Žåã«ç¹ã«åœãŠã¯ãŸããŸãã
TensorRT ãšã¯ã¹ããŒãã»ãã©ãŒãããã䜿çšããããšã§ Ultralytics YOLOv8NVIDIA ã¢ãã«ã匷åããããšãã§ããŸãããã®ã¬ã€ãã§ã¯ãå€æããã»ã¹ã®æé ããããããã説æãããã£ãŒãã©ãŒãã³ã°ã»ãããžã§ã¯ãã§NVIDIA ã®é«åºŠãªãã¯ãããžãŒãæ倧éã«æŽ»çšã§ããããã«ããŸãã
TensorRT
TensorRTNVIDIA ãé«éãã£ãŒãã©ãŒãã³ã°æšè«çšã«èšèšãããå é²çãªãœãããŠã§ã¢éçºãããïŒSDKïŒã§ãããç©äœæ€åºã®ãããªãªã¢ã«ã¿ã€ã ã¢ããªã±ãŒã·ã§ã³ã«é©ããŠããã
ãã®ããŒã«ãããã¯ããã£ãŒãã©ãŒãã³ã°ã¢ãã«ãNVIDIA GPU åãã«æé©åããããé«éã§å¹ççãªãªãã¬ãŒã·ã§ã³ãå®çŸããŸããTensorRT ã¢ãã«ã¯TensorRT æé©åãåããŸãããããã«ã¯ã¬ã€ã€ãŒèåã粟床ãã£ãªãã¬ãŒã·ã§ã³ïŒINT8 ããã³ FP16ïŒãåçtensor ã¡ã¢ãªç®¡çãã«ãŒãã«èªåãã¥ãŒãã³ã°ãªã©ã®æè¡ãå«ãŸããŸãããã£ãŒãã©ãŒãã³ã°ã¢ãã«ãTensorRT 圢åŒã«å€æããããšã§ãéçºè ã¯NVIDIA GPUã®æœåšèœåãååã«çºæ®ããããšãã§ããŸãã
TensorRT ã¯ãTensorFlow ãPyTorch ãONNX ãå«ãæ§ã ãªã¢ãã«ãã©ãŒããããšã®äºææ§ã§ç¥ãããŠãããéçºè ã«ç°ãªããã¬ãŒã ã¯ãŒã¯ã®ã¢ãã«ãçµ±åãæé©åããããã®æè»ãªãœãªã¥ãŒã·ã§ã³ãæäŸããŸãããã®æ±çšæ§ã«ãããå€æ§ãªããŒããŠã§ã¢ããã³ãœãããŠã§ã¢ç°å¢ã§ã®å¹ççãªã¢ãã«å±éãå¯èœã«ãªããŸãã
TensorRT ã¢ãã«ã®äž»ãªç¹åŸŽ
TensorRT ã¢ãã«ã¯ãé«éãã£ãŒãã©ãŒãã³ã°æšè«ã«ãããå¹çæ§ãšæå¹æ§ã«è²¢ç®ãããããŸããŸãªäž»èŠæ©èœãæäŸããïŒ
-
粟å¯ãã£ãªãã¬ãŒã·ã§ã³:TensorRT ã¯ç²Ÿå¯ãã£ãªãã¬ãŒã·ã§ã³ããµããŒãããŠãããç¹å®ã®ç²ŸåºŠèŠä»¶ã«åãããŠã¢ãã«ã埮調æŽããããšãã§ããŸããããã«ã¯INT8ãFP16ã®ãããªäœç²ŸåºŠãã©ãŒãããã®ãµããŒããå«ãŸãã蚱容å¯èœãªç²ŸåºŠã¬ãã«ãç¶æããªããæšè«é床ãããã«åäžãããããšãã§ããŸãã
-
ã¬ã€ã€ãŒèåïŒTensorRT æé©åããã»ã¹ã«ã¯ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®è€æ°ã®ã¬ã€ã€ãŒã1ã€ã®æŒç®ã«çµ±åããã¬ã€ã€ãŒèåãå«ãŸããŸããããã«ããèšç®ãªãŒããŒããããåæžãããã¡ã¢ãªã¢ã¯ã»ã¹ãšèšç®ãæå°åãããããæšè«é床ãåäžããŸãã
-
åçãªTensor ã¡ã¢ãªç®¡çïŒTensorRT æšè«äžã®tensor ã¡ã¢ãªäœ¿çšéãå¹ççã«ç®¡çããã¡ã¢ãªã®ãªãŒããŒããããæžãããã¡ã¢ãªå²ãåœãŠãæé©åããããã®çµæãããå¹ççãªGPU ã¡ã¢ãªå©çšãå¯èœã«ãªãã
-
èªåã«ãŒãã«ã»ãã¥ãŒãã³ã°ïŒTensorRT ã¯ãã¢ãã«ã®åå±€ã«æãæé©åãããGPU ã«ãŒãã«ãéžæããèªåã«ãŒãã«ã»ãã¥ãŒãã³ã°ãé©çšããããã®é©å¿çã¢ãããŒãã¯ãã¢ãã«ãGPU ã®èšç®èœåããã«ã«æŽ»çšããããšãä¿èšŒããã
é åãªãã·ã§ã³TensorRT
ãšã¯ã¹ããŒãã®ã³ãŒããèŠãåã« YOLOv8 models ã TensorRT ãã©ãŒããããã©ãã«ããããç解ããŸããã TensorRT éåžžãã¢ãã«ã䜿çšãããŸãã
TensorRT ã«ã¯ããã€ãã®å°å ¥ãªãã·ã§ã³ããããåãªãã·ã§ã³ã§çµ±åã®ãããããããã©ãŒãã³ã¹ã®æé©åãæè»æ§ã®ãã©ã³ã¹ãç°ãªã£ãŠããïŒ
- TensorFlow å ã«é 眮ãã: ãã®æ¹æ³ã¯TensorRT ãTensorFlow ã«çµ±åããæé©åãããã¢ãã«ã䜿ãæ £ããTensorFlow ç°å¢ã§å®è¡ã§ããããã«ããŸããTF-TRTã¯ããããå¹ççã«åŠçã§ããããããµããŒããããŠããã¬ã€ã€ãŒãšãµããŒããããŠããªãã¬ã€ã€ãŒãæ··åšããã¢ãã«ã«äŸ¿å©ã§ãã
-
ã¹ã¿ã³ãã¢ãã³TensorRT ã©ã³ã¿ã€ã APIïŒãã现ããå¶åŸ¡ãå¯èœã§ãããã©ãŒãã³ã¹ãéèŠããã¢ããªã±ãŒã·ã§ã³ã«æé©ãããè€éã§ããããµããŒããããŠããªãæŒç®åã®ã«ã¹ã¿ã å®è£ ãå¯èœã§ãã
-
NVIDIA Triton æšè«ãµãŒããŒïŒæ§ã ãªãã¬ãŒã ã¯ãŒã¯ã®ã¢ãã«ããµããŒããããªãã·ã§ã³ãç¹ã«ã¯ã©ãŠãããšããžæšè«ã«é©ããŠãããã¢ãã«ã®åæå®è¡ãã¢ãã«åæãªã©ã®æ©èœãæäŸããã
YOLOv8 ã¢ãã«ã®ãšã¯ã¹ããŒãTensorRT
YOLOv8 ã¢ãã«ãTensorRT ãã©ãŒãããã«å€æããããšã§ãå®è¡å¹çãåäžãããããã©ãŒãã³ã¹ãæé©åããããšãã§ããŸãã
ã€ã³ã¹ããŒã«
å¿ èŠãªããã±ãŒãžãã€ã³ã¹ããŒã«ããã«ã¯ã以äžãå®è¡ããïŒ
ã€ã³ã¹ããŒã«ããã»ã¹ã«é¢ãã詳现ãªèª¬æãšãã¹ããã©ã¯ãã£ã¹ã«ã€ããŠã¯ãYOLOv8 ã€ã³ã¹ããŒã«ã¬ã€ããã芧ãã ãããYOLOv8 ã«å¿ èŠãªããã±ãŒãžãã€ã³ã¹ããŒã«ããéã«ãäœããã®åé¡ãçºçããå Žåã¯ã解決çããã³ãã«ã€ããŠãããããåé¡ã¬ã€ããåç §ããŠãã ããã
䜿çšæ¹æ³
䜿ãæ¹ã®èª¬æã«å ¥ãåã«ã Ultralytics ãæäŸããYOLOv8 ã¢ãã«ã®ã©ã€ã³ããããã確èªãã ãããããã¯ãããªãã®ãããžã§ã¯ãã®èŠä»¶ã«æãé©ããã¢ãã«ãéžæããã®ã«åœ¹ç«ã¡ãŸãã
䜿çšæ¹æ³
from ultralytics import YOLO
# Load the YOLOv8 model
model = YOLO("yolov8n.pt")
# Export the model to TensorRT format
model.export(format="engine") # creates 'yolov8n.engine'
# Load the exported TensorRT model
tensorrt_model = YOLO("yolov8n.engine")
# Run inference
results = tensorrt_model("https://ultralytics.com/images/bus.jpg")
ãšã¯ã¹ããŒãããã»ã¹ã®è©³çŽ°ã«ã€ããŠã¯ãUltralytics ããã¥ã¡ã³ãã®ãšã¯ã¹ããŒãã«é¢ããããŒãžãã芧ãã ããã
INT8 éååã«ããTensorRT ã®ãšã¯ã¹ããŒã
INT8 粟床ã®TensorRT ã䜿çšããŠUltralytics YOLO ã¢ãã«ããšã¯ã¹ããŒããããšããã¹ããã¬ãŒãã³ã°éååïŒPTQïŒãå®è¡ãããŸããTensorRT ã¯PTQã®ããã«ãã£ãªãã¬ãŒã·ã§ã³ã䜿çšããŸããããã¯ãYOLO ã¢ãã«ã代衚çãªå ¥åããŒã¿ã«å¯ŸããŠæšè«ãåŠçããéã«ãåã¢ã¯ãã£ããŒã·ã§ã³tensor å ã®ã¢ã¯ãã£ããŒã·ã§ã³ã®ååžã枬å®ãããã®ååžã䜿çšããŠåtensor ã®ã¹ã±ãŒã«å€ãæšå®ããŸããéååã®åè£ãšãªãå掻æ§åtensor ã¯ããã£ãªãã¬ãŒã·ã§ã³ããã»ã¹ã«ãã£ãŠæšæž¬ãããé¢é£ã¹ã±ãŒã«ãæã€ã
æé»çã«éååããããããã¯ãŒã¯ãåŠçãããšããTensorRT ã¯ã¬ã€ã€ãŒã®å®è¡æéãæé©åããããã«INT8ãèšæ©å¿å€ã«äœ¿çšãããããã¬ã€ã€ãINT8ã§é«éã«åäœãããã®ããŒã¿å ¥åºåã«éååã¹ã±ãŒã«ãå²ãåœãŠãããŠããå ŽåãINT8粟床ã®ã«ãŒãã«ããã®ã¬ã€ã€ã«å²ãåœãŠããããããã§ãªãå ŽåãTensorRT ããã®ã¬ã€ã€ã®å®è¡æéãéããªãæ¹ã«åºã¥ããŠãã«ãŒãã«ã«FP32ãŸãã¯FP16ã®ããããã®ç²ŸåºŠãéžæããã
ããã
èŒæ£çµæã¯ããã€ã¹ã«ãã£ãŠç°ãªãå¯èœæ§ããããããTensorRT ã¢ãã«ãŠã§ã€ããé åã«äœ¿çšããããã€ã¹ãšåããã®ã INT8 粟床ã§ãšã¯ã¹ããŒãããããã«äœ¿çšããããšãéèŠã§ãã
INT8ãšã¯ã¹ããŒãã®èšå®
ã䜿çšããéã«æäŸãããåŒæ°ã§ãã èŒžåº Ultralytics YOLO ã¢ãã«ã®å Žå 倧ãã« ã¯ãšã¯ã¹ããŒããããã¢ãã«ã®ããã©ãŒãã³ã¹ã«åœ±é¿ããŸãããŸããå©çšå¯èœãªããã€ã¹ãªãœãŒã¹ã«åºã¥ããŠéžæããå¿
èŠããããŸãããããã©ã«ãã®åŒæ° ã¹ãã§ãã ã»ãšãã©ã®å Žå Ampere (ãŸãã¯æ°ãã)NVIDIA ãã£ã¹ã¯ãªãŒãGPU.䜿çšãããæ ¡æ£ã¢ã«ãŽãªãºã 㯠"ENTROPY_CALIBRATION_2"
ãªãã·ã§ã³ã®è©³çŽ°ã«ã€ããŠã¯ããã¡ããã芧ãã ããã TensorRT éçºè
ã¬ã€ãUltralytics ãã¹ãã«ãããš "ENTROPY_CALIBRATION_2"
ãæè¯ã®éžæã§ããã茞åºã¯ãã®ã¢ã«ãŽãªãºã ã䜿ãããšã確å®ããŠããã
-
workspace
:ã¢ãã«ã®éã¿ãå€æããéã®ããã€ã¹ã¡ã¢ãªã®å²ãåœãŠãµã€ãºïŒGiBåäœïŒãå¶åŸ¡ããŸãã-
ã調æŽããã
workspace
æ ¡æ£ã®ããŒãºãšãªãœãŒã¹ã®å©çšå¯èœæ§ã«å¿ããŠãå€ãèšå®ããŠãã ããããã倧ããªworkspace
èŒæ£æéãé·ããªãå¯èœæ§ããããŸããããã®åãTensorRT ãæé©åã®æŠè¡ãå¹ åºãæ€èšã§ããããã«ãªããã¢ãã«ã®æ§èœãšç²ŸåºŠãåäžããå¯èœæ§ããããŸããéã«workspace
ã¯èŒæ£æéãççž®ã§ããããæé©åæŠç¥ãå¶éãããéååã¢ãã«ã®å質ã«åœ±é¿ãäžããå¯èœæ§ãããã -
ããã©ã«ãã¯
workspace=4
(GiBïŒããã£ãªãã¬ãŒã·ã§ã³ãã¯ã©ãã·ã¥ããïŒèŠåãªãã«çµäºããïŒå Žåã¯ããã®å€ãå¢ããå¿ èŠããããããããŸããã -
TensorRT å ±åãã
UNSUPPORTED_STATE
ã®å€ãworkspace
ã®å€ã¯ãããã€ã¹ãå©çšå¯èœãªã¡ã¢ãªããã倧ãããworkspace
ãäžããã¹ãã§ããã -
ãã
workspace
ãæ倧å€ã«èšå®ãããèŒæ£ã倱æ/ã¯ã©ãã·ã¥ããå Žåã¯ã以äžã®å€ãäžããããšãæ€èšããŠãã ãããimgsz
ãããŠbatch
ã䜿çšããããšã§ãå¿ èŠãªã¡ã¢ãªãåæžããããšãã§ããã -
INT8ã®ãã£ãªãã¬ãŒã·ã§ã³ã¯åããã€ã¹ã«åºæã§ããããšãå¿ããªãã§ãã ããããã£ãªãã¬ãŒã·ã§ã³ã®ããã« "ãã€ãšã³ã"GPU ãåçšãããšãæšè«ãä»ã®ããã€ã¹ã§å®è¡ãããšãã«ããã©ãŒãã³ã¹ãäœäžããå¯èœæ§ããããŸãã
-
-
batch
:æšè«ã«äœ¿çšãããæ倧ããããµã€ãºãæšè«äžãããå°ãããããã䜿çšããããšãã§ããããæšè«ã¯æå®ãããããããã倧ãããããã¯åãä»ããªãã
泚
ãã£ãªãã¬ãŒã·ã§ã³äžã2å batch
æäŸããããµã€ãºã䜿çšããããå°ãããã®äœ¿çšã¯ãèŒæ£æã«äžæ£ç¢ºãªã¹ã±ãŒãªã³ã°ã«ã€ãªããå¯èœæ§ããããŸããããã¯ãããã»ã¹ãèŠãããŒã¿ã«åºã¥ããŠèª¿æŽããããã§ããå°ãããã®å Žåãå€ã®å
šç¯å²ãææ¡ã§ããªãå¯èœæ§ããããæçµçãªèŒæ£ã«åé¡ãçãããã batch
ãµã€ãºã¯èªåçã«2åã«ãªããããããµã€ãºãæå®ãããŠããªãå Žå batch=1
æ ¡æ£ã¯ batch=1 * 2
èŒæ£ã®ã¹ã±ãŒãªã³ã°èª€å·®ãæžããã
NVIDIA ã®å®éšã«ãããšãINT8éååãã£ãªãã¬ãŒã·ã§ã³ã§ã¯ãã¢ãã«ã®ããŒã¿ã代衚ãããã£ãªãã¬ãŒã·ã§ã³ç»åãå°ãªããšã500æ䜿çšããããšãæšå¥šããŠããŸããããã¯ã¬ã€ãã©ã€ã³ã§ãã ããŒã ãš ããªãã®ããŒã¿ã»ããã§äœãå¿
èŠããè©ŠããŠã¿ãå¿
èŠãããã TensorRT ãINT8ã®èŒæ£ã«ã¯èŒæ£ããŒã¿ãå¿
èŠã§ãã®ã§ãå¿
ã data
åŒæ° int8=True
TensorRT ã䜿çšããã data="my_dataset.yaml"
ã®ç»åã䜿çšããã ããªããŒã·ã§ã³ ã§æ ¡æ£ãããã«å€ãæž¡ãããªãå Žå data
INT8éååã§TensorRT ã ã¢ãã«ã¿ã¹ã¯ã«åºã¥ã "å°ã㪠"ãµã³ãã«ããŒã¿ã»ãã ãšã©ãŒãã¹ããŒããªãã
äŸ
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
model.export(
format="engine",
dynamic=True, # (1)!
batch=8, # (2)!
workspace=4, # (3)!
int8=True,
data="coco.yaml", # (4)!
)
# Load the exported TensorRT INT8 model
model = YOLO("yolov8n.engine", task="detect")
# Run inference
result = model.predict("https://ultralytics.com/images/bus.jpg")
- ã§ãšã¯ã¹ããŒãããå Žåãããã©ã«ãã§æå¹ã«ãªããŸãã
int8=True
æ瀺çã«èšå®ãããŠããªãå Žåã§ãåç § 茞åºåŒæ° ãã芧ãã ããã - ã§èŒæ£ãããšã¯ã¹ããŒãã¢ãã«ã®æ倧ããããµã€ãºã 8 ã«èšå®ããŸãã
batch = 2 * 8
æ ¡æ£æã®ã¹ã±ãŒãªã³ã°ãšã©ãŒãé¿ããããã - å€æåŠçã®ããã«ããã€ã¹å šäœãå²ãåœãŠã代ããã«ã4GiBã®ã¡ã¢ãªãå²ãåœãŠãã
- COCOããŒã¿ã»ãããæ ¡æ£ã«äœ¿çšãç¹ã«æ€èšŒã«äœ¿çšããç»åïŒåèš5,000æïŒã
# Export a YOLOv8n PyTorch model to TensorRT format with INT8 quantization
yolo export model=yolov8n.pt format=engine batch=8 workspace=4 int8=True data=coco.yaml # creates 'yolov8n.engine''
# Run inference with the exported TensorRT quantized model
yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'
ãã£ãªãã¬ãŒã·ã§ã³ã»ãã£ãã·ã¥
TensorRT ãçæããŸãã .cache
ããã¯ãåãããŒã¿ã䜿ã£ãå°æ¥ã®ã¢ãã«éã¿ã®ãšã¯ã¹ããŒããé«éåããããã«åå©çšããããšãã§ããŸãããããŒã¿ã倧ããç°ãªãå Žåãã次ã®ãããªå Žåã¯ããã£ãªãã¬ãŒã·ã§ã³ãããŸããããªãå¯èœæ§ããããŸãã batch
å€ã倧å¹
ã«å€æŽãããããã®ãããªç¶æ³ã§ã¯ãæ¢åã® .cache
ã¯ååãå€ããŠå¥ã®ãã£ã¬ã¯ããªã«ç§»åããããå®å
šã«åé€ããå¿
èŠãããã
TensorRT INT8 ã§YOLO ã䜿çšããå©ç¹
-
ã¢ãã«ãµã€ãºã®çž®å°ïŒFP32ããINT8ãžã®éååã«ãããïŒãã£ã¹ã¯äžãŸãã¯ã¡ã¢ãªäžã®ïŒã¢ãã«ãµã€ãºã4åã®1ã«ãªããããŠã³ããŒãæéãççž®ãããŸãã
-
æ¶è²»é»åã®äœæžïŒINT8ã§ãšã¯ã¹ããŒããããYOLO ã¢ãã«ã®ç²ŸåºŠãäžããæŒç®ã¯ãFP32ã¢ãã«ã«æ¯ã¹ãŠæ¶è²»é»åãæããããšãã§ããç¹ã«ããããªãŒé§åã®ããã€ã¹ã«é©ããŠããŸãã
-
æšè«é床ã®åäžïŒ TensorRT ã¯ãã¿ãŒã²ãããšãªãããŒããŠã§ã¢ã«åãããŠã¢ãã«ãæé©åãããããGPUãçµã¿èŸŒã¿ããã€ã¹ãã¢ã¯ã»ã©ã¬ãŒã¿ã§ã®æšè«é床ãåäžããå¯èœæ§ããããŸãã
æšè«é床ã«é¢ããã¡ã¢
TensorRT INT8 ã«ãšã¯ã¹ããŒããããã¢ãã«ã䜿ã£ãæåã®æ°åã®æšè«åŒã³åºãã¯ãååŠçãæšè«ãåŸåŠçã«éåžžããé·ãæéããããããšãäºæ³ãããŸãããã㯠imgsz
æšè«äžãç¹ã« imgsz
ã¯ããšã¯ã¹ããŒãæã«æå®ããããã®ãšåãã§ã¯ãããŸããïŒãšã¯ã¹ããŒã imgsz
ã¯TensorRT ãæé©ããããã¡ã€ã«ãšããŠèšå®ãããŠããïŒã
TensorRT INT8ã§YOLO ã
-
è©äŸ¡ææšã®äœäžïŒ ããäœã粟床ã䜿çšãããš
mAP
,Precision
,Recall
ãŸã㯠ã¢ãã«ã®æ§èœãè©äŸ¡ããããã«äœ¿çšããããã®ä»ã®ææš ã¯å€å°æªããªãå¯èœæ§ãé«ãã以äžã®èšäºãåç §ããããã ããã©ãŒãã³ã¹çµæã»ã¯ã·ã§ã³ ã®éããæ¯èŒãããmAP50
ãããŠmAP50-95
æ§ã ãªããã€ã¹ã®å°ããªãµã³ãã«ã§INT8ã§ãšã¯ã¹ããŒãããå Žåã -
éçºæéã®å¢å ïŒããŒã¿ã»ãããšããã€ã¹ã«å¿ããINT8èŒæ£ã®ãæé©ãèšå®ãèŠã€ããã«ã¯ãããªãã®éã®ãã¹ããå¿ èŠã§ãã
-
ããŒããŠã§ã¢äŸåæ§ïŒãã£ãªãã¬ãŒã·ã§ã³ãæ§èœåäžã¯ããŒããŠã§ã¢ã«å€§ããäŸåããå¯èœæ§ããããã¢ãã«ã®éã¿ä»ãã¯ç§»æ€æ§ãäœãã
Ultralytics YOLO TensorRT 茞åºå®çžŸ
NVIDIA A100
ããã©ãŒãã³ã¹
Ubuntu 22.04.3 LTSã§ãã¹ãã python 3.10.12
, ultralytics==8.2.4
, tensorrt==8.6.1.post1
80ã®èšç·Žæžã¿ã¯ã©ã¹ãå«ãCOCOäžã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãDetection Docsãåç §ããŠãã ããã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.52 | 0.51 | 0.56 | 8 | 640 | ||
FP32 | COCOval | 0.52 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.34 | 0.34 | 0.41 | 8 | 640 | ||
FP16 | COCOval | 0.33 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.28 | 0.27 | 0.31 | 8 | 640 | ||
INT8 | COCOval | 0.29 | 0.47 | 0.33 | 1 | 640 |
COCOã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãSegmentation Docsãåç §ããŠãã ããã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n-seg.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
mAPval 50(M) |
mAPval 50-95(M) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.62 | 0.61 | 0.68 | 8 | 640 | ||||
FP32 | COCOval | 0.63 | 0.52 | 0.36 | 0.49 | 0.31 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.40 | 0.39 | 0.44 | 8 | 640 | ||||
FP16 | COCOval | 0.43 | 0.52 | 0.36 | 0.49 | 0.30 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.34 | 0.33 | 0.37 | 8 | 640 | ||||
INT8 | COCOval | 0.36 | 0.46 | 0.32 | 0.43 | 0.27 | 1 | 640 |
ImageNetã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãClassification Docsãåç §ããŠãã ããã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n-cls.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
ããã | ããã5 | batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.26 | 0.25 | 0.28 | 8 | 640 | ||
FP32 | ã€ã¡ãŒãžããããã« | 0.26 | 0.35 | 0.61 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.18 | 0.17 | 0.19 | 8 | 640 | ||
FP16 | ã€ã¡ãŒãžããããã« | 0.18 | 0.35 | 0.61 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.16 | 0.15 | 0.57 | 8 | 640 | ||
INT8 | ã€ã¡ãŒãžããããã« | 0.15 | 0.32 | 0.59 | 1 | 640 |
COCOã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãPose Estimation Docsãåç §ããŠãã ããã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n-pose.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
mAPval 50(P) |
mAPval 50-95(P) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.54 | 0.53 | 0.58 | 8 | 640 | ||||
FP32 | COCOval | 0.55 | 0.91 | 0.69 | 0.80 | 0.51 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.37 | 0.35 | 0.41 | 8 | 640 | ||||
FP16 | COCOval | 0.36 | 0.91 | 0.69 | 0.80 | 0.51 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.29 | 0.28 | 0.33 | 8 | 640 | ||||
INT8 | COCOval | 0.30 | 0.90 | 0.68 | 0.78 | 0.47 | 1 | 640 |
DOTAv1ã§èšç·Žããããããã®ã¢ãã«ã®äœ¿çšäŸã«ã€ããŠã¯ãOriented Detection Docsãåç §ã®ããšã
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n-obb.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 0.52 | 0.51 | 0.59 | 8 | 640 | ||
FP32 | DOTAv1val | 0.76 | 0.50 | 0.36 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.34 | 0.33 | 0.42 | 8 | 640 | ||
FP16 | DOTAv1val | 0.59 | 0.50 | 0.36 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.29 | 0.28 | 0.33 | 8 | 640 | ||
INT8 | DOTAv1val | 0.32 | 0.45 | 0.32 | 1 | 640 |
ã³ã³ã·ã¥ãŒããŒåãGPU
æ€åºæ§èœïŒCOCOïŒ
Windows 10.0.19045ã§ãã¹ãã python 3.10.9
, ultralytics==8.2.4
, tensorrt==10.0.0b6
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 1.06 | 0.75 | 1.88 | 8 | 640 | ||
FP32 | COCOval | 1.37 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.62 | 0.75 | 1.13 | 8 | 640 | ||
FP16 | COCOval | 0.85 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.52 | 0.38 | 1.00 | 8 | 640 | ||
INT8 | COCOval | 0.74 | 0.47 | 0.33 | 1 | 640 |
Windows 10.0.22631ã§ãã¹ãã python 3.11.9
, ultralytics==8.2.4
, tensorrt==10.0.1
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 1.76 | 1.69 | 1.87 | 8 | 640 | ||
FP32 | COCOval | 1.94 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 0.86 | 0.75 | 1.00 | 8 | 640 | ||
FP16 | COCOval | 1.43 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.80 | 0.75 | 1.00 | 8 | 640 | ||
INT8 | COCOval | 1.35 | 0.47 | 0.33 | 1 | 640 |
Pop!_OS 22.04 LTSã§ãã¹ãã python 3.10.12
, ultralytics==8.2.4
, tensorrt==8.6.1.post1
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 2.84 | 2.84 | 2.85 | 8 | 640 | ||
FP32 | COCOval | 2.94 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 1.09 | 1.09 | 1.10 | 8 | 640 | ||
FP16 | COCOval | 1.20 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 0.75 | 0.74 | 0.75 | 8 | 640 | ||
INT8 | COCOval | 0.76 | 0.47 | 0.33 | 1 | 640 |
çµã¿èŸŒã¿æ©åš
æ€åºæ§èœïŒCOCOïŒ
JetPack 6.0 (L4T 36.3) Ubuntu 22.04.4 LTSã§ãã¹ãã python 3.10.12
, ultralytics==8.2.16
, tensorrt==10.0.1
泚
æšè«æé mean
, min
(æéïŒããã㊠max
(æãé
ã) äºååŠç¿ãããéã¿ã䜿çšããåãã¹ãã«ã€ã㊠yolov8n.engine
ç²Ÿå¯ | è©äŸ¡è©Šéš | å¹³å (ms) |
æå°ïœæ倧 (ms) |
mAPval 50(B) |
mAPval 50-95(B) |
batch |
ãµã€ãº (ãã¯ã»ã«) |
---|---|---|---|---|---|---|---|
FP32 | äºæž¬ãã | 6.11 | 6.10 | 6.29 | 8 | 640 | ||
FP32 | COCOval | 6.17 | 0.52 | 0.37 | 1 | 640 | |
FP16 | äºæž¬ãã | 3.18 | 3.18 | 3.20 | 8 | 640 | ||
FP16 | COCOval | 3.19 | 0.52 | 0.37 | 1 | 640 | |
INT8 | äºæž¬ãã | 2.30 | 2.29 | 2.35 | 8 | 640 | ||
INT8 | COCOval | 2.32 | 0.46 | 0.32 | 1 | 640 |
ã€ã³ãã©ã¡ãŒã·ã§ã³
NVIDIA JetsonwithUltralytics YOLO ã® ã¯ã€ãã¯ã¹ã¿ãŒãã¬ã€ãã§ãã»ããã¢ãããšèšå®ã®è©³çŽ°ãã芧ãã ããã
è©äŸ¡æ¹æ³
ãããã®ã¢ãã«ãã©ã®ããã«ãšã¯ã¹ããŒãããããã¹ãããããã«ã€ããŠã¯ã以äžã®ã»ã¯ã·ã§ã³ãåç §ããŠãã ããã
ãšã¯ã¹ããŒãèšå®
ãšã¯ã¹ããŒãèšå®åŒæ°ã®è©³çŽ°ã«ã€ããŠã¯ããšã¯ã¹ããŒãã»ã¢ãŒããåç §ããŠãã ããã
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
# TensorRT FP32
out = model.export(format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2)
# TensorRT FP16
out = model.export(format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2, half=True)
# TensorRT INT8 with calibration `data` (i.e. COCO, ImageNet, or DOTAv1 for appropriate model task)
out = model.export(
format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2, int8=True, data="coco8.yaml"
)
äºæž¬ã«ãŒã
詳现ã¯äºæž¬ã¢ãŒããåç §ã
ããªããŒã·ã§ã³èšå®
åç
§ val
ã¢ãŒã ãåç
§ããŠãã ããã
ãšã¯ã¹ããŒããããYOLOv8 TensorRT ã¢ãã«ã®å±é
Ultralytics YOLOv8 ã¢ãã«ãTensorRT ãã©ãŒãããã«ãšã¯ã¹ããŒãããããšã«æåããŸãããæ§ã ãªèšå®ã§ã®TensorRT ã¢ãã«ã®ãããã€ã«é¢ãã詳ãã説æã¯ã以äžã®ãªãœãŒã¹ãã芧ãã ããïŒ
-
Triton ãµãŒããŒã§Ultralytics ãå±éãã:NVIDIA ã®Triton Inference (æ§TensorRT Inference) Server ããç¹ã«Ultralytics YOLO ã¢ãã«ã§äœ¿çšããæ¹æ³ã«ã€ããŠã®ã¬ã€ãã§ãã
-
ãã£ãŒãã»ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ã®å±éNVIDIA TensorRT:ãã®èšäºã§ã¯ãNVIDIA TensorRT ã䜿ã£ãŠãGPU- ããŒã¹ã®ãããã€ã¡ã³ãã»ãã©ãããã©ãŒã ã«ãã£ãŒãã»ãã¥ãŒã©ã«ã»ãããã¯ãŒã¯ãå¹ççã«ãããã€ããæ¹æ³ã説æããã
-
NVIDIA ããŒã¹PCã®ããã®ãšã³ãã»ããŒã»ãšã³ãAIïŒNVIDIA TensorRT å±é:ãã®ããã°èšäºã§ã¯ãNVIDIA-based PCäžã§ã®AIã¢ãã«ã®æé©åãšãããã€ã®ããã®NVIDIA TensorRT ã®äœ¿çšã«ã€ããŠèª¬æããŸãã
-
GitHub Repository forNVIDIA TensorRT ïŒ:NVIDIA TensorRT ã®ãœãŒã¹ã³ãŒããšããã¥ã¡ã³ããå«ãå ¬åŒ GitHub ãªããžããªã§ãã
æŠèŠ
ãã®ã¬ã€ãã§ã¯ãUltralytics YOLOv8 ã®ã¢ãã«ããNVIDIA'TensorRT ã®ã¢ãã«ãã©ãŒãããã«å€æããããšã«çŠç¹ãåœãŠãããã®å€æã¹ãããã¯ãYOLOv8 ã¢ãã«ã®å¹çãšé床ãåäžãããããå¹æçã§å€æ§ãªå±éç°å¢ã«é©ãããã®ã«ããããã«éåžžã«éèŠã§ãã
䜿ãæ¹ã®è©³çŽ°ã«ã€ããŠã¯ãTensorRT å ¬åŒããã¥ã¡ã³ããã芧ãã ããã
ãã®ä»ã®Ultralytics YOLOv8 ã®çµ±åã«ã€ããŠãèå³ããããã§ããããçµ±åã¬ã€ãã®ããŒãžã§ãæçãªãªãœãŒã¹ãæŽå¯ãå¹ åºãã玹ä»ããŠããŸãã
ããããã質å
YOLOv8 ã¢ãã«ãTensorRT ãã©ãŒãããã«å€æããæ¹æ³ã¯ïŒ
Ultralytics YOLOv8 ã®ã¢ãã«ãTensorRT 圢åŒã«å€æããæé©åãããNVIDIA GPU æšè«ãè¡ãã«ã¯ã以äžã®æé ã«åŸãïŒ
-
å¿ èŠãªããã±ãŒãžãã€ã³ã¹ããŒã«ããïŒ
-
YOLOv8 ã¢ãã«ããšã¯ã¹ããŒãããŸãïŒ
詳现ã«ã€ããŠã¯ãYOLOv8 ã€ã³ã¹ããŒã«ã¬ã€ãããã³ãšã¯ã¹ããŒãããã¥ã¡ã³ããã芧ãã ããã
YOLOv8 ã¢ãã«ã«TensorRT ã䜿çšããå©ç¹ã¯äœã§ããïŒ
YOLOv8 ã¢ãã«ãæé©åããããã«TensorRT ã䜿çšãããšãããã€ãã®å©ç¹ãããïŒ
- æšè«é床ã®é«éåïŒTensorRT ã¯ã¢ãã«ã¬ã€ã€ãŒãæé©åããé«ç²ŸåºŠãã£ãªãã¬ãŒã·ã§ã³ïŒINT8ãšFP16ïŒã䜿çšããããšã§ã粟床ãå€§å¹ ã«ç ç²ã«ããããšãªãæšè«é床ãé«éåããŸãã
- ã¡ã¢ãªå¹çïŒTensorRT ã¯tensor ã®ã¡ã¢ãªãåçã«ç®¡çãããªãŒããŒããããåæžããGPU ã®ã¡ã¢ãªäœ¿çšçãåäžãããã
- ã¬ã€ã€ãŒèåïŒè€æ°ã®ã¬ã€ã€ãŒã1ã€ã®æŒç®ã«çµ±åããèšç®ã®è€éãã軜æžããã
- ã«ãŒãã«ã®èªåãã¥ãŒãã³ã°ïŒåã¢ãã«ã¬ã€ã€ãŒã«æé©åãããGPU ã«ãŒãã«ãèªåçã«éžæããæé«ã®ããã©ãŒãã³ã¹ãä¿èšŒããŸãã
詳ããã¯ãTensorRT ã®è©³çŽ°æ©èœãã芧ãã ããããŸããTensorRT ã®æŠèŠã»ã¯ã·ã§ã³ãã芧ãã ããã
TensorRT ãYOLOv8 ã¢ãã«ã§ INT8 éååã䜿çšã§ããŸããïŒ
ã¯ããTensorRT ãINT8éååã䜿ã£ãŠYOLOv8 ã¢ãã«ããšã¯ã¹ããŒãããããšãã§ããŸãããã®ããã»ã¹ã«ã¯ãåŠç¿åŸã®éååïŒPTQïŒãšãã£ãªãã¬ãŒã·ã§ã³ãå«ãŸããŸãïŒ
-
INT8ã§ãšã¯ã¹ããŒãïŒ
-
æšè«ãå®è¡ããïŒ
詳现ã«ã€ããŠã¯ãINT8éååã§TensorRT ã
YOLOv8 TensorRT ã¢ãã«ãNVIDIA Triton æšè«ãµãŒããŒã«ãããã€ããã«ã¯ïŒ
YOLOv8 TensorRT ã¢ãã«ãNVIDIA Triton Inference Server ã«ãããã€ããã«ã¯ã以äžã®ãªãœãŒã¹ã䜿çšããïŒ
- Triton ãµãŒãã§Ultralytics YOLOv8 ãå±éããã:Triton Inference Server ã®ã»ããã¢ãããšäœ¿çšã«é¢ããã¹ããããã€ã¹ãããã®ã¬ã€ãã³ã¹ã
- NVIDIA Triton æšè«ãµãŒãã»ããã¥ã¡ã³ã:NVIDIA ã®å ¬åŒããã¥ã¡ã³ãã§ã詳现ãªå°å ¥ãªãã·ã§ã³ãšèšå®ãã芧ããã ããŸãã
ãããã®ã¬ã€ãã¯ãYOLOv8 ã¢ãã«ãããŸããŸãªé åç°å¢ã§å¹ççã«çµ±åããã®ã«åœ¹ç«ã€ã
TensorRT ã«ãšã¯ã¹ããŒããããYOLOv8 ã¢ãã«ã§ç¢ºèªãããããã©ãŒãã³ã¹ã®åäžãšã¯ïŒ
TensorRT ã«ããããã©ãŒãã³ã¹ã®åäžã¯ã䜿çšããããŒããŠã§ã¢ã«ãã£ãŠç°ãªããŸãã以äžã¯ä»£è¡šçãªãã³ãããŒã¯ã§ãïŒ
-
NVIDIA A100:
- FP32æšè«ïŒ~0.52ããªç§/ç»å
- FP16æšè«ïŒ~0.34ããªç§/ç»å
- INT8æšè«ïŒ~0.28ããªç§/ç»å
- INT8粟床ã§ã¯mAPããããã«æžå°ããŠããããã¹ããŒãã¯å€§å¹ ã«åäžããŠããã
-
ã³ã³ã·ã¥ãŒããŒåãGPUïŒäŸïŒRTX 3080ïŒïŒ
- FP32æšè«ïŒ~1.06ããªç§/ç»å
- FP16æšè«ïŒ~0.62ããªç§/ç»å
- INT8æšè«ïŒ~0.52ããªç§/ç»å
ããŸããŸãªããŒããŠã§ã¢æ§æã®è©³çŽ°ãªæ§èœãã³ãããŒã¯ã¯ãæ§èœã®ã»ã¯ã·ã§ã³ã§èŠãããšãã§ããã
TensorRT ã®ããã©ãŒãã³ã¹ã«é¢ããããå æ¬çãªæŽå¯ã«ã€ããŠã¯ãUltralytics ã®ããã¥ã¡ã³ãããã³ããã©ãŒãã³ã¹åæã¬ããŒããåç §ããŠãã ããã