์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

TensorRT YOLOv8 ๋ชจ๋ธ์šฉ ๋‚ด๋ณด๋‚ด๊ธฐ

Deploying computer vision models in high-performance environments can require a format that maximizes speed and efficiency. This is especially true when you are deploying your model on NVIDIA GPUs.

By using the TensorRT export format, you can enhance your Ultralytics YOLOv8 models for swift and efficient inference on NVIDIA hardware. This guide will give you easy-to-follow steps for the conversion process and help you make the most of NVIDIA's advanced technology in your deep learning projects.

TensorRT

TensorRT ๊ฐœ์š”

TensorRT, developed by NVIDIA, is an advanced software development kit (SDK) designed for high-speed deep learning inference. It's well-suited for real-time applications like object detection.

์ด ํˆดํ‚ท์€ ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ NVIDIA GPU์— ์ตœ์ ํ™”ํ•˜์—ฌ ๋” ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์ธ ์ž‘์—…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. TensorRT ๋ชจ๋ธ์€ ๋ ˆ์ด์–ด ์œตํ•ฉ, ์ •๋ฐ€ ๋ณด์ •(INT8 ๋ฐ FP16), ๋™์  tensor ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ๋ฐ ์ปค๋„ ์ž๋™ ํŠœ๋‹๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์„ ํฌํ•จํ•˜๋Š” TensorRT ์ตœ์ ํ™”๋ฅผ ๊ฑฐ์นฉ๋‹ˆ๋‹ค. ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ TensorRT ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋ฉด ๊ฐœ๋ฐœ์ž๋Š” NVIDIA GPU์˜ ์ž ์žฌ๋ ฅ์„ ์™„์ „ํžˆ ์‹คํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

TensorRT is known for its compatibility with various model formats, including TensorFlow, PyTorch, and ONNX, providing developers with a flexible solution for integrating and optimizing models from different frameworks. This versatility enables efficient model deployment across diverse hardware and software environments.

TensorRT ๋ชจ๋ธ์˜ ์ฃผ์š” ๊ธฐ๋Šฅ

TensorRT ๋ชจ๋ธ์€ ๊ณ ์† ๋”ฅ ๋Ÿฌ๋‹ ์ถ”๋ก ์˜ ํšจ์œจ์„ฑ๊ณผ ํšจ๊ณผ์— ๊ธฐ์—ฌํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ฃผ์š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • ์ •๋ฐ€ ๋ณด์ •: TensorRT ์—์„œ๋Š” ์ •๋ฐ€ ๋ณด์ •์„ ์ง€์›ํ•˜์—ฌ ํŠน์ • ์ •ํ™•๋„ ์š”๊ฑด์— ๋งž๊ฒŒ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ์ •ํ™•๋„ ์ˆ˜์ค€์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์ถ”๋ก  ์†๋„๋ฅผ ๋”์šฑ ๋†’์ผ ์ˆ˜ ์žˆ๋Š” INT8 ๋ฐ FP16๊ณผ ๊ฐ™์€ ๊ฐ์†Œ๋œ ์ •๋ฐ€๋„ ํ˜•์‹์— ๋Œ€ํ•œ ์ง€์›์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

  • Layer Fusion: The TensorRT optimization process includes layer fusion, where multiple layers of a neural network are combined into a single operation. This reduces computational overhead and improves inference speed by minimizing memory access and computation.

TensorRT ๋ ˆ์ด์–ด ํ“จ์ „

  • ๋™์  Tensor ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ: TensorRT ์ถ”๋ก  ์ค‘ tensor ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ํšจ์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ณ  ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋ฅ ์ด ๋”์šฑ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

  • ์ž๋™ ์ปค๋„ ํŠœ๋‹: TensorRT ์ž๋™ ์ปค๋„ ํŠœ๋‹์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ๊ฐ ๋ ˆ์ด์–ด์— ๊ฐ€์žฅ ์ตœ์ ํ™”๋œ GPU ์ปค๋„์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด ์ ์‘ํ˜• ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋ชจ๋ธ์ด GPU ์˜ ๊ณ„์‚ฐ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

๋ฐฐํฌ ์˜ต์…˜ TensorRT

YOLOv8 ๋ชจ๋ธ์„ TensorRT ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ด๋Š” ์ฝ”๋“œ๋ฅผ ์‚ดํŽด๋ณด๊ธฐ ์ „์— TensorRT ๋ชจ๋ธ์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์œ„์น˜๋ฅผ ์ดํ•ดํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

TensorRT ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฐํฌ ์˜ต์…˜์„ ์ œ๊ณตํ•˜๋ฉฐ, ๊ฐ ์˜ต์…˜์€ ํ†ตํ•ฉ ์šฉ์ด์„ฑ, ์„ฑ๋Šฅ ์ตœ์ ํ™”, ์œ ์—ฐ์„ฑ ๊ฐ„์˜ ๊ท ํ˜•์„ ๋‹ค๋ฅด๊ฒŒ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค:

  • Deploying within TensorFlow: This method integrates TensorRT into TensorFlow, allowing optimized models to run in a familiar TensorFlow environment. It's useful for models with a mix of supported and unsupported layers, as TF-TRT can handle these efficiently.

TensorRT ๊ฐœ์š”

  • ๋…๋ฆฝํ˜• TensorRT ๋Ÿฐํƒ€์ž„ API: ์„ธ๋ถ„ํ™”๋œ ์ œ์–ด ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜์—ฌ ์„ฑ๋Šฅ์ด ์ค‘์š”ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค. ๋” ๋ณต์žกํ•˜์ง€๋งŒ ์ง€์›๋˜์ง€ ์•Š๋Š” ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉ์ž ์ง€์ •์œผ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„: ๋‹ค์–‘ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๋ชจ๋ธ์„ ์ง€์›ํ•˜๋Š” ์˜ต์…˜์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ํด๋ผ์šฐ๋“œ ๋˜๋Š” ์—์ง€ ์ถ”๋ก ์— ์ ํ•ฉํ•˜๋ฉฐ, ๋™์‹œ ๋ชจ๋ธ ์‹คํ–‰ ๋ฐ ๋ชจ๋ธ ๋ถ„์„๊ณผ ๊ฐ™์€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

YOLOv8 ๋ชจ๋ธ๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ TensorRT

YOLOv8 ๋ชจ๋ธ์„ TensorRT ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์‹คํ–‰ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์„ค์น˜

ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๋ ค๋ฉด ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

์„ค์น˜

# Install the required package for YOLOv8
pip install ultralytics

์„ค์น˜ ๊ณผ์ •๊ณผ ๊ด€๋ จ๋œ ์ž์„ธํ•œ ์ง€์นจ๊ณผ ๋ชจ๋ฒ” ์‚ฌ๋ก€๋Š” YOLOv8 ์„ค์น˜ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. YOLOv8 ์— ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๋Š” ๋™์•ˆ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•๊ณผ ํŒ์„ ํ™•์ธํ•˜์„ธ์š”.

์‚ฌ์šฉ๋ฒ•

์‚ฌ์šฉ ์ง€์นจ์„ ์‚ดํŽด๋ณด๊ธฐ ์ „์— Ultralytics ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋‹ค์–‘ํ•œ YOLOv8 ๋ชจ๋ธ์„ ํ™•์ธํ•˜์„ธ์š”. ํ”„๋กœ์ ํŠธ ์š”๊ตฌ ์‚ฌํ•ญ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋ฒ•

from ultralytics import YOLO

# Load the YOLOv8 model
model = YOLO("yolov8n.pt")

# Export the model to TensorRT format
model.export(format="engine")  # creates 'yolov8n.engine'

# Load the exported TensorRT model
tensorrt_model = YOLO("yolov8n.engine")

# Run inference
results = tensorrt_model("https://ultralytics.com/images/bus.jpg")
# Export a YOLOv8n PyTorch model to TensorRT format
yolo export model=yolov8n.pt format=engine  # creates 'yolov8n.engine''

# Run inference with the exported model
yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'

๋‚ด๋ณด๋‚ด๊ธฐ ํ”„๋กœ์„ธ์Šค์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‚ด๋ณด๋‚ด๊ธฐ ๊ด€๋ จ ๋ฌธ์„œ ํŽ˜์ด์ง€(Ultralytics )๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

INT8 ์ •๋Ÿ‰ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ TensorRT ๋‚ด๋ณด๋‚ด๊ธฐ

Exporting Ultralytics YOLO models using TensorRT with INT8 precision executes post-training quantization (PTQ). TensorRT uses calibration for PTQ, which measures the distribution of activations within each activation tensor as the YOLO model processes inference on representative input data, and then uses that distribution to estimate scale values for each tensor. Each activation tensor that is a candidate for quantization has an associated scale that is deduced by a calibration process.

์•”์‹œ์ ์œผ๋กœ ์–‘์žํ™”๋œ ๋„คํŠธ์›Œํฌ๋ฅผ ์ฒ˜๋ฆฌํ•  ๋•Œ TensorRT ์€ ๊ณ„์ธต ์‹คํ–‰ ์‹œ๊ฐ„์„ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐํšŒ์ฃผ์˜์ ์œผ๋กœ INT8์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ ˆ์ด์–ด๊ฐ€ INT8์—์„œ ๋” ๋น ๋ฅด๊ฒŒ ์‹คํ–‰๋˜๊ณ  ๋ฐ์ดํ„ฐ ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ์— ์–‘์žํ™” ์Šค์ผ€์ผ์ด ํ• ๋‹น๋œ ๊ฒฝ์šฐ, ํ•ด๋‹น ๋ ˆ์ด์–ด์— INT8 ์ •๋ฐ€๋„๋ฅผ ๊ฐ€์ง„ ์ปค๋„์ด ํ• ๋‹น๋˜๊ณ , ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ TensorRT ๋Š” ํ•ด๋‹น ๋ ˆ์ด์–ด์˜ ์‹คํ–‰ ์‹œ๊ฐ„์ด ๋” ๋น ๋ฅธ ๊ฒƒ์„ ๊ธฐ์ค€์œผ๋กœ ์ปค๋„์— ๋Œ€ํ•ด FP32 ๋˜๋Š” FP16์˜ ์ •๋ฐ€๋„๋ฅผ ์„ ํƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

ํŒ

์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ๊ฒฐ๊ณผ๋Š” ๊ธฐ๊ธฐ๋งˆ๋‹ค ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ฐฐํฌ์— TensorRT ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•  ๋™์ผํ•œ ๊ธฐ๊ธฐ๋ฅผ INT8 ์ •๋ฐ€๋„๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๋ฐ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

INT8 ๋‚ด๋ณด๋‚ด๊ธฐ ๊ตฌ์„ฑ

์‚ฌ์šฉ ์‹œ ์ œ๊ณต๋˜๋Š” ์ธ์ˆ˜ ๋‚ด๋ณด๋‚ด๊ธฐ Ultralytics YOLO ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ํฌ๊ฒŒ ๋Š” ๋‚ด๋ณด๋‚ธ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ์ค๋‹ˆ๋‹ค. ๋˜ํ•œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์žฅ์น˜ ๋ฆฌ์†Œ์Šค์— ๋”ฐ๋ผ ์„ ํƒํ•ด์•ผ ํ•˜์ง€๋งŒ ๊ธฐ๋ณธ ์ธ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. should ๋Œ€๋ถ€๋ถ„์˜ ์ž‘์—… Ampere(๋˜๋Š” ์ตœ์‹  ๋ฒ„์ „) NVIDIA ์™ธ์žฅํ˜• GPU. ์‚ฌ์šฉ๋˜๋Š” ๋ณด์ • ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ "ENTROPY_CALIBRATION_2" ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์˜ต์…˜์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. TensorRT ๊ฐœ๋ฐœ์ž ๊ฐ€์ด๋“œ์—์„œ. Ultralytics ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ "ENTROPY_CALIBRATION_2" ๊ฐ€ ์ตœ์„ ์˜ ์„ ํƒ์ด์—ˆ์œผ๋ฉฐ ๋‚ด๋ณด๋‚ด๊ธฐ๋Š” ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋„๋ก ๊ณ ์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • workspace : ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ณ€ํ™˜ํ•˜๋Š” ๋™์•ˆ ๋””๋ฐ”์ด์Šค ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์˜ ํฌ๊ธฐ(GiB ๋‹จ์œ„)๋ฅผ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค.

    • ์กฐ์ • workspace ๊ฐ’์— ๋”ฐ๋ผ ๋ณด์ • ์š”๊ตฌ ์‚ฌํ•ญ๊ณผ ๋ฆฌ์†Œ์Šค ๊ฐ€์šฉ์„ฑ์— ๋”ฐ๋ผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋” ํฐ workspace may increase calibration time, it allows TensorRT to explore a wider range of optimization tactics, potentially enhancing model performance and accuracy. Conversely, a smaller workspace ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณด์ • ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ์ง€๋งŒ ์ตœ์ ํ™” ์ „๋žต์ด ์ œํ•œ๋˜์–ด ์ •๋Ÿ‰ํ™”๋œ ๋ชจ๋ธ์˜ ํ’ˆ์งˆ์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • ๊ธฐ๋ณธ๊ฐ’์€ workspace=4 (๊ธฐ๊ฐ€๋ฐ”์ดํŠธ)๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๊ฒฝ์šฐ ๋ณด์ •์ด ์ถฉ๋Œ(๊ฒฝ๊ณ  ์—†์ด ์ข…๋ฃŒ)ํ•˜๋Š” ๊ฒฝ์šฐ ์ด ๊ฐ’์„ ๋Š˜๋ ค์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • TensorRT ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค UNSUPPORTED_STATE ์˜ ๊ฐ’์„ ๋‚ด๋ณด๋‚ด๋Š” ๋™์•ˆ workspace ์˜ ๊ฐ’์ด ์žฅ์น˜์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋ชจ๋ฆฌ๋ณด๋‹ค ํฌ๋ฉด workspace ๋ฅผ ๋‚ฎ์ถฐ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

    • ๋งŒ์•ฝ workspace ๊ฐ€ ์ตœ๋Œ€ ๊ฐ’์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ๊ณ  ๋ณด์ •์— ์‹คํŒจ/์ถฉ๋Œํ•˜๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ ๊ฐ’์„ ์ค„์ด๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•˜์‹ญ์‹œ์˜ค. imgsz ๊ทธ๋ฆฌ๊ณ  batch ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • INT8์— ๋Œ€ํ•œ ๋ณด์ •์€ ๊ฐ ๋””๋ฐ”์ด์Šค๋งˆ๋‹ค ๋‹ค๋ฅด๋ฉฐ, ๋ณด์ •์„ ์œ„ํ•ด 'ํ•˜์ด์—”๋“œ' GPU ๋ฅผ ์ฐจ์šฉํ•˜๋ฉด ๋‹ค๋ฅธ ๋””๋ฐ”์ด์Šค์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•  ๋•Œ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ๊ธฐ์–ตํ•˜์„ธ์š”.

  • batch : ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ตœ๋Œ€ ๋ฐฐ์น˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ์ถ”๋ก  ์ค‘์—๋Š” ๋” ์ž‘์€ ๋ฐฐ์น˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ถ”๋ก ์€ ์ง€์ •๋œ ๊ฒƒ๋ณด๋‹ค ํฐ ๋ฐฐ์น˜๋Š” ํ—ˆ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ฐธ๊ณ 

๋ณด์ •ํ•˜๋Š” ๋™์•ˆ ๋‘ ๋ฐฐ์˜ batch ํฌ๊ธฐ๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ž‘์€ ๋ฐฐ์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณด์ • ์ค‘์— ์Šค์ผ€์ผ๋ง์ด ๋ถ€์ •ํ™•ํ•ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋ณด์ด๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์กฐ์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ž‘์€ ๋ฐฐ์น˜๋Š” ์ „์ฒด ๋ฒ”์œ„์˜ ๊ฐ’์„ ์บก์ฒ˜ํ•˜์ง€ ๋ชปํ•˜์—ฌ ์ตœ์ข… ๋ณด์ •์— ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. batch size is doubled automatically. If no batch size is specified batch=1์—์„œ ๋ณด์ •์ด ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. batch=1 * 2 ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณด์ • ์Šค์ผ€์ผ๋ง ์˜ค๋ฅ˜๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

NVIDIA ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ, INT8 ์ •๋Ÿ‰ํ™” ๋ณด์ •์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€ํ‘œํ•˜๋Š” 500๊ฐœ ์ด์ƒ์˜ ๋ณด์ • ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์„ ๊ถŒ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ€์ด๋“œ๋ผ์ธ์ผ ๋ฟ hard ์š”๊ตฌ ์‚ฌํ•ญ ๋ฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์„ฑ๋Šฅ์„ ์ž˜ ๋ฐœํœ˜ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ฒƒ์ด ๋ฌด์—‡์ธ์ง€ ์‹คํ—˜ํ•ด ๋ณด์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. INT8 ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์—๋Š” TensorRT ์„ ์‚ฌ์šฉํ•œ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋ฏ€๋กœ ๋ฐ˜๋“œ์‹œ data ์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ int8=True TensorRT ๋ฅผ ํด๋ฆญํ•˜๊ณ  data="my_dataset.yaml"์˜ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณด์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ’์ด ์ „๋‹ฌ๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ data ๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๊ฒฝ์šฐ INT8 ์ •๋Ÿ‰ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ TensorRT ๋กœ ๋‚ด๋ณด๋‚ด๋ฉด ๊ธฐ๋ณธ๊ฐ’์€ ๋‹ค์Œ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ ์ž‘์—…์— ๊ธฐ๋ฐ˜ํ•œ "์ž‘์€" ์˜ˆ์ œ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ

from ultralytics import YOLO

model = YOLO("yolov8n.pt")
model.export(
    format="engine",
    dynamic=True,  # (1)!
    batch=8,  # (2)!
    workspace=4,  # (3)!
    int8=True,
    data="coco.yaml",  # (4)!
)

# Load the exported TensorRT INT8 model
model = YOLO("yolov8n.engine", task="detect")

# Run inference
result = model.predict("https://ultralytics.com/images/bus.jpg")
  1. ๋™์  ์ถ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‚ด๋ณด๋‚ผ ๋•Œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด ์˜ต์…˜์ด ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. int8=True ๋ช…์‹œ์ ์œผ๋กœ ์„ค์ •ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋„ ๋งˆ์ฐฌ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค. ์ฐธ์กฐ ๋‚ด๋ณด๋‚ด๊ธฐ ์ธ์ˆ˜ ๋ฅผ ํด๋ฆญํ•ด ์ž์„ธํ•œ ์ •๋ณด๋ฅผ ํ™•์ธํ•˜์„ธ์š”.
  2. ๋‚ด๋ณด๋‚ธ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ตœ๋Œ€ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ 8๋กœ ์„ค์ •ํ•˜๊ณ  ๋‹ค์Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ณด์ •ํ•ฉ๋‹ˆ๋‹ค. batch = 2 * 8 ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณด์ • ์ค‘ ์Šค์ผ€์ผ๋ง ์˜ค๋ฅ˜๋ฅผ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  3. ๋ณ€ํ™˜ ํ”„๋กœ์„ธ์Šค๋ฅผ ์œ„ํ•ด ์ „์ฒด ๋””๋ฐ”์ด์Šค๋ฅผ ํ• ๋‹นํ•˜๋Š” ๋Œ€์‹  4๊ธฐ๊ฐ€๋ฐ”์ดํŠธ์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค.
  4. ๋ณด์ •, ํŠนํžˆ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ์— ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฏธ์ง€(์ด 5,000๊ฐœ)์— COCO ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
# Export a YOLOv8n PyTorch model to TensorRT format with INT8 quantization
yolo export model=yolov8n.pt format=engine batch=8 workspace=4 int8=True data=coco.yaml  # creates 'yolov8n.engine''

# Run inference with the exported TensorRT quantized model
yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'
๋ณด์ • ์บ์‹œ

TensorRT ์€ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. .cache ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์—ฌ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ–ฅํ›„ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ๋‚ด๋ณด๋‚ด๊ธฐ ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌ๊ฒŒ ๋‹ค๋ฅด๊ฑฐ๋‚˜ batch ๊ฐ’์ด ํฌ๊ฒŒ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ๋Š” ๊ธฐ์กด .cache ์˜ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜์—ฌ ๋‹ค๋ฅธ ๋””๋ ‰ํ† ๋ฆฌ๋กœ ์˜ฎ๊ธฐ๊ฑฐ๋‚˜ ์™„์ „ํžˆ ์‚ญ์ œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

TensorRT INT8๊ณผ ํ•จ๊ป˜ YOLO ์‚ฌ์šฉ์˜ ์žฅ์ 

  • ๋ชจ๋ธ ํฌ๊ธฐ ๊ฐ์†Œ: FP32์—์„œ INT8๋กœ ์–‘์žํ™”ํ•˜๋ฉด ๋ชจ๋ธ ํฌ๊ธฐ๊ฐ€ 4๋ฐฐ(๋””์Šคํฌ ๋˜๋Š” ๋ฉ”๋ชจ๋ฆฌ)๋กœ ์ค„์–ด๋“ค์–ด ๋‹ค์šด๋กœ๋“œ ์‹œ๊ฐ„์ด ๋‹จ์ถ•๋˜๊ณ , ์Šคํ† ๋ฆฌ์ง€ ์š”๊ตฌ ์‚ฌํ•ญ์ด ๋‚ฎ์•„์ง€๋ฉฐ, ๋ชจ๋ธ ๋ฐฐํฌ ์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ „๋ ฅ ์†Œ๋น„ ๊ฐ์†Œ: INT8 ์ˆ˜์ถœ์šฉ YOLO ๋ชจ๋ธ์˜ ์ •๋ฐ€ ์—ฐ์‚ฐ ๊ฐ์†Œ๋Š” ํŠนํžˆ ๋ฐฐํ„ฐ๋ฆฌ๋กœ ๊ตฌ๋™๋˜๋Š” ์žฅ์น˜์˜ ๊ฒฝ์šฐ FP32 ๋ชจ๋ธ์— ๋น„ํ•ด ์ „๋ ฅ ์†Œ๋น„๊ฐ€ ์ ์Šต๋‹ˆ๋‹ค.

  • ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ: TensorRT ์€ ๋Œ€์ƒ ํ•˜๋“œ์›จ์–ด์— ๋งž๊ฒŒ ๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜์—ฌ ์ž ์žฌ์ ์œผ๋กœ GPU, ์ž„๋ฒ ๋””๋“œ ์žฅ์น˜ ๋ฐ ๊ฐ€์†๊ธฐ์—์„œ ๋” ๋น ๋ฅธ ์ถ”๋ก  ์†๋„๋ฅผ ์ด๋Œ์–ด๋ƒ…๋‹ˆ๋‹ค.

์ถ”๋ก  ์†๋„์— ๋Œ€ํ•œ ์ฐธ๊ณ  ์‚ฌํ•ญ

TensorRT INT8๋กœ ๋‚ด๋ณด๋‚ธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ์ฒ˜์Œ ๋ช‡ ๋ฒˆ์˜ ์ถ”๋ก  ํ˜ธ์ถœ์€ ์ผ๋ฐ˜์ ์ธ ์ „์ฒ˜๋ฆฌ, ์ถ”๋ก  ๋ฐ/๋˜๋Š” ํ›„์ฒ˜๋ฆฌ ์‹œ๊ฐ„๋ณด๋‹ค ๋” ์˜ค๋ž˜ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์Œ์„ ๋ณ€๊ฒฝํ•  ๋•Œ๋„ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. imgsz ์ถ”๋ก ํ•˜๋Š” ๋™์•ˆ, ํŠนํžˆ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฝ์šฐ imgsz ๊ฐ€ ๋‚ด๋ณด๋‚ด๊ธฐ ์‹œ ์ง€์ •ํ•œ ๊ฒƒ๊ณผ ๋™์ผํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ(๋‚ด๋ณด๋‚ด๊ธฐ imgsz TensorRT "์ตœ์ " ํ”„๋กœํ•„๋กœ ์„ค์ •๋จ).

TensorRT INT8๊ณผ YOLO ์‚ฌ์šฉ์˜ ๋‹จ์ 

  • ํ‰๊ฐ€ ์ง€ํ‘œ๊ฐ€ ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค: ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด mAP, Precision, Recall ๋˜๋Š” ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐํƒ€ ๋ฉ”ํŠธ๋ฆญ ๋Š” ๋‹ค์†Œ ์•…ํ™”๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ ์„น์…˜ ์˜ ์ฐจ์ด์ ์„ ๋น„๊ตํ•˜๋ ค๋ฉด mAP50 ๊ทธ๋ฆฌ๊ณ  mAP50-95 ๋‹ค์–‘ํ•œ ๊ธฐ๊ธฐ์˜ ์ž‘์€ ์ƒ˜ํ”Œ์—์„œ INT8๋กœ ๋‚ด๋ณด๋‚ผ ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • ๊ฐœ๋ฐœ ์‹œ๊ฐ„ ์ฆ๊ฐ€: ๋ฐ์ดํ„ฐ ์„ธํŠธ์™€ ๋””๋ฐ”์ด์Šค์— ๋Œ€ํ•œ INT8 ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์˜ '์ตœ์ ' ์„ค์ •์„ ์ฐพ์œผ๋ ค๋ฉด ์ƒ๋‹นํ•œ ์–‘์˜ ํ…Œ์ŠคํŠธ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ํ•˜๋“œ์›จ์–ด ์ข…์†์„ฑ: ๋ณด์ • ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ํ•˜๋“œ์›จ์–ด์— ๋”ฐ๋ผ ํฌ๊ฒŒ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋Š” ์ด์ „ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

Ultralytics YOLO TensorRT ์ˆ˜์ถœ ์‹ค์ 

NVIDIA A100

์„ฑ๋Šฅ

์šฐ๋ถ„ํˆฌ 22.04.3 LTS์—์„œ ํ…Œ์ŠคํŠธํ–ˆ์Šต๋‹ˆ๋‹ค, python 3.10.12, ultralytics==8.2.4, tensorrt==8.6.1.post1

์‚ฌ์ „ ํ•™์Šต๋œ 80๊ฐœ์˜ ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜์—ฌ COCO์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ์ œ๋Š” ํƒ์ง€ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.52 0.51 | 0.56 8 640
FP32 COCOval 0.52 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 0.34 0.34 | 0.41 8 640
FP16 COCOval 0.33 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 0.28 0.27 | 0.31 8 640
INT8 COCOval 0.29 0.47 0.33 1 640

์„ธ๋ถ„ํ™” ๋ฌธ์„œ์—์„œ 80๊ฐœ์˜ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜์—ฌ COCO์—์„œ ํ›ˆ๋ จ๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ์‹œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n-seg.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
mAPval
50(M)
mAPval
50-95(M)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.62 0.61 | 0.68 8 640
FP32 COCOval 0.63 0.52 0.36 0.49 0.31 1 640
FP16 ์˜ˆ์ธก 0.40 0.39 | 0.44 8 640
FP16 COCOval 0.43 0.52 0.36 0.49 0.30 1 640
INT8 ์˜ˆ์ธก 0.34 0.33 | 0.37 8 640
INT8 COCOval 0.36 0.46 0.32 0.43 0.27 1 640

1000๊ฐœ์˜ ์‚ฌ์ „ ํ•™์Šต๋œ ํด๋ž˜์Šค๊ฐ€ ํฌํ•จ๋œ ImageNet์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ๋Š” ๋ถ„๋ฅ˜ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n-cls.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
top-1 ์ƒ์œ„ 5์œ„ batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.26 0.25 | 0.28 8 640
FP32 ์ด๋ฏธ์ง€๋„ท๋ฐธ 0.26 0.35 0.61 1 640
FP16 ์˜ˆ์ธก 0.18 0.17 | 0.19 8 640
FP16 ์ด๋ฏธ์ง€๋„ท๋ฐธ 0.18 0.35 0.61 1 640
INT8 ์˜ˆ์ธก 0.16 0.15 | 0.57 8 640
INT8 ์ด๋ฏธ์ง€๋„ท๋ฐธ 0.15 0.32 0.59 1 640

์‚ฌ์ „ ํ•™์Šต๋œ ํด๋ž˜์Šค์ธ '์‚ฌ๋žŒ' 1๊ฐœ๋ฅผ ํฌํ•จํ•˜์—ฌ COCO์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ๋Š” ํฌ์ฆˆ ์ถ”์ • ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n-pose.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
mAPval
50(P)
mAPval
50-95(P)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.54 0.53 | 0.58 8 640
FP32 COCOval 0.55 0.91 0.69 0.80 0.51 1 640
FP16 ์˜ˆ์ธก 0.37 0.35 | 0.41 8 640
FP16 COCOval 0.36 0.91 0.69 0.80 0.51 1 640
INT8 ์˜ˆ์ธก 0.29 0.28 | 0.33 8 640
INT8 COCOval 0.30 0.90 0.68 0.78 0.47 1 640

์‚ฌ์ „ ํ•™์Šต๋œ 15๊ฐœ์˜ ํด๋ž˜์Šค๊ฐ€ ํฌํ•จ๋œ DOTAv1์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ์ œ๋Š” ์ง€ํ–ฅ ํƒ์ง€ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n-obb.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.52 0.51 | 0.59 8 640
FP32 DOTAv1val 0.76 0.50 0.36 1 640
FP16 ์˜ˆ์ธก 0.34 0.33 | 0.42 8 640
FP16 DOTAv1val 0.59 0.50 0.36 1 640
INT8 ์˜ˆ์ธก 0.29 0.28 | 0.33 8 640
INT8 DOTAv1val 0.32 0.45 0.32 1 640

์†Œ๋น„์ž์šฉ GPU

ํƒ์ง€ ์„ฑ๋Šฅ(COCO)

Windows 10.0.19045์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค, python 3.10.9, ultralytics==8.2.4, tensorrt==10.0.0b6

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 1.06 0.75 | 1.88 8 640
FP32 COCOval 1.37 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 0.62 0.75 | 1.13 8 640
FP16 COCOval 0.85 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 0.52 0.38 | 1.00 8 640
INT8 COCOval 0.74 0.47 0.33 1 640

Windows 10.0.22631์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค, python 3.11.9, ultralytics==8.2.4, tensorrt==10.0.1

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 1.76 1.69 | 1.87 8 640
FP32 COCOval 1.94 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 0.86 0.75 | 1.00 8 640
FP16 COCOval 1.43 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 0.80 0.75 | 1.00 8 640
INT8 COCOval 1.35 0.47 0.33 1 640

Pop!_OS 22.04 LTS์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค, python 3.10.12, ultralytics==8.2.4, tensorrt==8.6.1.post1

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 2.84 2.84 | 2.85 8 640
FP32 COCOval 2.94 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 1.09 1.09 | 1.10 8 640
FP16 COCOval 1.20 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 0.75 0.74 | 0.75 8 640
INT8 COCOval 0.76 0.47 0.33 1 640

์ž„๋ฒ ๋””๋“œ ๋””๋ฐ”์ด์Šค

ํƒ์ง€ ์„ฑ๋Šฅ(COCO)

JetPack 6.0(L4T 36.3) ์šฐ๋ถ„ํˆฌ 22.04.4 LTS๋กœ ํ…Œ์ŠคํŠธํ–ˆ์Šต๋‹ˆ๋‹ค, python 3.10.12, ultralytics==8.2.16, tensorrt==10.0.1

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 6.11 6.10 | 6.29 8 640
FP32 COCOval 6.17 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 3.18 3.18 | 3.20 8 640
FP16 COCOval 3.19 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 2.30 2.29 | 2.35 8 640
INT8 COCOval 2.32 0.46 0.32 1 640

์ •๋ณด

์„ค์ • ๋ฐ ๊ตฌ์„ฑ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด Ultralytics YOLO ์—์„œ ๋น ๋ฅธ ์‹œ์ž‘ ๊ฐ€์ด๋“œ( NVIDIA Jetson)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํ‰๊ฐ€ ๋ฐฉ๋ฒ•

์•„๋ž˜ ์„น์…˜์„ ํ™•์žฅํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ๋‚ด๋ณด๋‚ด๊ณ  ํ…Œ์ŠคํŠธํ•œ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

๊ตฌ์„ฑ ๋‚ด๋ณด๋‚ด๊ธฐ

๋‚ด๋ณด๋‚ด๊ธฐ ๊ตฌ์„ฑ ์ธ์ˆ˜์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‚ด๋ณด๋‚ด๊ธฐ ๋ชจ๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# TensorRT FP32
out = model.export(format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2)

# TensorRT FP16
out = model.export(format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2, half=True)

# TensorRT INT8 with calibration `data` (i.e. COCO, ImageNet, or DOTAv1 for appropriate model task)
out = model.export(
    format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2, int8=True, data="coco8.yaml"
)
์˜ˆ์ธก ๋ฃจํ”„

์ž์„ธํ•œ ๋‚ด์šฉ์€ ์˜ˆ์ธก ๋ชจ๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

import cv2

from ultralytics import YOLO

model = YOLO("yolov8n.engine")
img = cv2.imread("path/to/image.jpg")

for _ in range(100):
    result = model.predict(
        [img] * 8,  # batch=8 of the same image
        verbose=False,
        device="cuda",
    )
์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๊ตฌ์„ฑ

์ฐธ์กฐ val ๋ชจ๋“œ ๋ฅผ ํด๋ฆญํ•˜์—ฌ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๊ตฌ์„ฑ ์ธ์ˆ˜์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”.

from ultralytics import YOLO

model = YOLO("yolov8n.engine")
results = model.val(
    data="data.yaml",  # COCO, ImageNet, or DOTAv1 for appropriate model task
    batch=1,
    imgsz=640,
    verbose=False,
    device="cuda",
)

๋‚ด๋ณด๋‚ธ YOLOv8 TensorRT ๋ชจ๋ธ ๋ฐฐํฌ

Ultralytics YOLOv8 ๋ชจ๋ธ์„ TensorRT ํ˜•์‹์œผ๋กœ ์„ฑ๊ณต์ ์œผ๋กœ ๋‚ด๋ณด๋ƒˆ์œผ๋ฉด ์ด์ œ ๋ฐฐํฌํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์„ค์ •์—์„œ TensorRT ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ง€์นจ์€ ๋‹ค์Œ ๋ฆฌ์†Œ์Šค๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”:

์š”์•ฝ

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” Ultralytics YOLOv8 ๋ชจ๋ธ์„ NVIDIA ์˜ TensorRT ๋ชจ๋ธ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ณ€ํ™˜ ๋‹จ๊ณ„๋Š” YOLOv8 ๋ชจ๋ธ์˜ ํšจ์œจ์„ฑ๊ณผ ์†๋„๋ฅผ ๊ฐœ์„ ํ•˜์—ฌ ๋ณด๋‹ค ํšจ๊ณผ์ ์ด๊ณ  ๋‹ค์–‘ํ•œ ๋ฐฐํฌ ํ™˜๊ฒฝ์— ์ ํ•ฉํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ TensorRT ๊ณต์‹ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ถ”๊ฐ€ ํ†ตํ•ฉ์— ๋Œ€ํ•ด ๊ถ๊ธˆํ•œ ์ ์ด ์žˆ๋‹ค๋ฉด Ultralytics YOLOv8 ํ†ตํ•ฉ ๊ฐ€์ด๋“œ ํŽ˜์ด์ง€์—์„œ ๋‹ค์–‘ํ•œ ๋ฆฌ์†Œ์Šค์™€ ์ธ์‚ฌ์ดํŠธ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

YOLOv8 ๋ชจ๋ธ์„ TensorRT ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜๋‚˜์š”?

์ตœ์ ํ™”๋œ NVIDIA GPU ์ถ”๋ก ์„ ์œ„ํ•ด Ultralytics YOLOv8 ๋ชจ๋ธ์„ TensorRT ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฅด์„ธ์š”:

  1. ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค:

    pip install ultralytics
    
  2. YOLOv8 ๋ชจ๋ธ์„ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค:

    from ultralytics import YOLO
    
    model = YOLO("yolov8n.pt")
    model.export(format="engine")  # creates 'yolov8n.engine'
    
    # Run inference
    model = YOLO("yolov8n.engine")
    results = model("https://ultralytics.com/images/bus.jpg")
    

์ž์„ธํ•œ ๋‚ด์šฉ์€ YOLOv8 ์„ค์น˜ ๊ฐ€์ด๋“œ ๋ฐ ๋‚ด๋ณด๋‚ด๊ธฐ ์„ค๋ช…์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

YOLOv8 ๋ชจ๋ธ์— TensorRT ์„ ์‚ฌ์šฉํ•˜๋ฉด ์–ด๋–ค ์ด์ ์ด ์žˆ๋‚˜์š”?

YOLOv8 ๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด TensorRT ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ช‡ ๊ฐ€์ง€ ์ด์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ: TensorRT ๋ชจ๋ธ ๋ ˆ์ด์–ด๋ฅผ ์ตœ์ ํ™”ํ•˜๊ณ  ์ •๋ฐ€ ๋ณด์ •(INT8 ๋ฐ FP16)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š์œผ๋ฉด์„œ ์ถ”๋ก  ์†๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.
  • ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ: TensorRT ์€ tensor ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋™์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜์—ฌ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ณ  GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋ฅ ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ ˆ์ด์–ด ํ“จ์ „: ์—ฌ๋Ÿฌ ๋ ˆ์ด์–ด๋ฅผ ๋‹จ์ผ ์ž‘์—…์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์„ ์ค„์ž…๋‹ˆ๋‹ค.
  • ์ปค๋„ ์ž๋™ ํŠœ๋‹: ๊ฐ ๋ชจ๋ธ ๋ ˆ์ด์–ด์— ์ตœ์ ํ™”๋œ GPU ์ปค๋„์„ ์ž๋™์œผ๋กœ ์„ ํƒํ•˜์—ฌ ์ตœ๋Œ€ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

์ž์„ธํ•œ ๋‚ด์šฉ์€ TensorRT ์—์„œ ์ž์„ธํ•œ ๊ธฐ๋Šฅ์„ ์‚ดํŽด๋ณด๊ณ  TensorRT ๊ฐœ์š” ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

YOLOv8 ๋ชจ๋ธ์— TensorRT ์œผ๋กœ INT8 ์ •๋Ÿ‰ํ™”๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‚˜์š”?

์˜ˆ, INT8 ์ •๋Ÿ‰ํ™”์™€ ํ•จ๊ป˜ TensorRT ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ YOLOv8 ๋ชจ๋ธ์„ ๋‚ด๋ณด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—๋Š” ํ›ˆ๋ จ ํ›„ ์ •๋Ÿ‰ํ™”(PTQ) ๋ฐ ๋ณด์ •์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค:

  1. INT8๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ:

    from ultralytics import YOLO
    
    model = YOLO("yolov8n.pt")
    model.export(format="engine", batch=8, workspace=4, int8=True, data="coco.yaml")
    
  2. ์ถ”๋ก ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

    from ultralytics import YOLO
    
    model = YOLO("yolov8n.engine", task="detect")
    result = model.predict("https://ultralytics.com/images/bus.jpg")
    

์ž์„ธํ•œ ๋‚ด์šฉ์€ INT8 ์ •๋Ÿ‰ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ TensorRT ๋‚ด๋ณด๋‚ด๊ธฐ ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„์— YOLOv8 TensorRT ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜๋‚˜์š”?

NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„์— YOLOv8 TensorRT ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ์ž‘์—…์€ ๋‹ค์Œ ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

์ด ๊ฐ€์ด๋“œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐฐํฌ ํ™˜๊ฒฝ์—์„œ YOLOv8 ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ํ†ตํ•ฉํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

TensorRT ์œผ๋กœ ๋‚ด๋ณด๋‚ธ YOLOv8 ๋ชจ๋ธ์—์„œ๋Š” ์–ด๋–ค ์„ฑ๋Šฅ ๊ฐœ์„ ์ด ๊ด€์ฐฐ๋˜๋‚˜์š”?

TensorRT ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ์‚ฌ์šฉํ•˜๋Š” ํ•˜๋“œ์›จ์–ด์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ๋ช‡ ๊ฐ€์ง€ ์ผ๋ฐ˜์ ์ธ ๋ฒค์น˜๋งˆํฌ์ž…๋‹ˆ๋‹ค:

  • NVIDIA A100:

    • FP32 ์ถ”๋ก : ~0.52ms/์ด๋ฏธ์ง€
    • FP16 ์ถ”๋ก : ~0.34ms/์ด๋ฏธ์ง€
    • INT8 ์ถ”๋ก : ~0.28ms/์ด๋ฏธ์ง€
    • INT8 ์ •๋ฐ€๋„๋กœ ๋งต์€ ์•ฝ๊ฐ„ ๊ฐ์†Œํ–ˆ์ง€๋งŒ ์†๋„๋Š” ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ์†Œ๋น„์ž์šฉ GPU(์˜ˆ: RTX 3080):

    • FP32 ์ถ”๋ก : ~1.06ms/์ด๋ฏธ์ง€
    • FP16 ์ถ”๋ก : ~0.62ms/์ด๋ฏธ์ง€
    • INT8 ์ถ”๋ก : ~0.52ms/์ด๋ฏธ์ง€

๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ๋Š” ์„ฑ๋Šฅ ์„น์…˜์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

TensorRT ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ๋ณด๋‹ค ํฌ๊ด„์ ์ธ ์ธ์‚ฌ์ดํŠธ๋Š” Ultralytics ๋ฌธ์„œ์™€ ์„ฑ๋Šฅ ๋ถ„์„ ๋ณด๊ณ ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.


๐Ÿ“… Created 8 months ago โœ๏ธ Updated 10 days ago

๋Œ“๊ธ€