์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

TensorRT Export for YOLO11 Models

๊ณ ์„ฑ๋Šฅ ํ™˜๊ฒฝ์—์„œ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋ ค๋ฉด ์†๋„์™€ ํšจ์œจ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ํ˜•์‹์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. NVIDIA GPU์— ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ๊ฒฝ์šฐ ํŠนํžˆ ๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค.

By using the TensorRT export format, you can enhance your Ultralytics YOLO11 models for swift and efficient inference on NVIDIA hardware. This guide will give you easy-to-follow steps for the conversion process and help you make the most of NVIDIA's advanced technology in your deep learning projects.

TensorRT

TensorRT ๊ฐœ์š”

TensorRT์—์„œ ๊ฐœ๋ฐœํ•œ NVIDIA ์€ ๊ณ ์† ๋”ฅ๋Ÿฌ๋‹ ์ถ”๋ก ์„ ์œ„ํ•ด ์„ค๊ณ„๋œ ๊ณ ๊ธ‰ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ ํ‚คํŠธ(SDK)์ž…๋‹ˆ๋‹ค. ๊ฐ์ฒด ๊ฐ์ง€์™€ ๊ฐ™์€ ์‹ค์‹œ๊ฐ„ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

์ด ํˆดํ‚ท์€ ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ NVIDIA GPU์— ์ตœ์ ํ™”ํ•˜์—ฌ ๋” ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์ธ ์ž‘์—…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. TensorRT ๋ชจ๋ธ์€ ๋ ˆ์ด์–ด ์œตํ•ฉ, ์ •๋ฐ€ ๋ณด์ •(INT8 ๋ฐ FP16), ๋™์  tensor ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ๋ฐ ์ปค๋„ ์ž๋™ ํŠœ๋‹๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์„ ํฌํ•จํ•˜๋Š” TensorRT ์ตœ์ ํ™”๋ฅผ ๊ฑฐ์นฉ๋‹ˆ๋‹ค. ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ TensorRT ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋ฉด ๊ฐœ๋ฐœ์ž๋Š” NVIDIA GPU์˜ ์ž ์žฌ๋ ฅ์„ ์™„์ „ํžˆ ์‹คํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

TensorRT ๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ํ˜•์‹๊ณผ์˜ ํ˜ธํ™˜์„ฑ์œผ๋กœ ์œ ๋ช…ํ•˜๋ฉฐ TensorFlow, PyTorch, ONNX ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๋ชจ๋ธ์„ ํ†ตํ•ฉํ•˜๊ณ  ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์—ฐํ•œ ์†”๋ฃจ์…˜์„ ๊ฐœ๋ฐœ์ž์—๊ฒŒ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹ค์šฉ๋„์„ฑ ๋•๋ถ„์— ๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด ๋ฐ ์†Œํ”„ํŠธ์›จ์–ด ํ™˜๊ฒฝ์—์„œ ํšจ์œจ์ ์œผ๋กœ ๋ชจ๋ธ์„ ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

TensorRT ๋ชจ๋ธ์˜ ์ฃผ์š” ๊ธฐ๋Šฅ

TensorRT ๋ชจ๋ธ์€ ๊ณ ์† ๋”ฅ ๋Ÿฌ๋‹ ์ถ”๋ก ์˜ ํšจ์œจ์„ฑ๊ณผ ํšจ๊ณผ์— ๊ธฐ์—ฌํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ฃผ์š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • ์ •๋ฐ€ ๋ณด์ •: TensorRT ์—์„œ๋Š” ์ •๋ฐ€ ๋ณด์ •์„ ์ง€์›ํ•˜์—ฌ ํŠน์ • ์ •ํ™•๋„ ์š”๊ฑด์— ๋งž๊ฒŒ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ์ •ํ™•๋„ ์ˆ˜์ค€์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์ถ”๋ก  ์†๋„๋ฅผ ๋”์šฑ ๋†’์ผ ์ˆ˜ ์žˆ๋Š” INT8 ๋ฐ FP16๊ณผ ๊ฐ™์€ ๊ฐ์†Œ๋œ ์ •๋ฐ€๋„ ํ˜•์‹์— ๋Œ€ํ•œ ์ง€์›์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

  • ๋ ˆ์ด์–ด ํ“จ์ „: TensorRT ์ตœ์ ํ™” ํ”„๋กœ์„ธ์Šค์—๋Š” ์‹ ๊ฒฝ๋ง์˜ ์—ฌ๋Ÿฌ ๊ณ„์ธต์„ ๋‹จ์ผ ์—ฐ์‚ฐ์œผ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ๊ณ„์ธต ์œตํ•ฉ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ฉ”๋ชจ๋ฆฌ ์•ก์„ธ์Šค ๋ฐ ๊ณ„์‚ฐ์„ ์ตœ์†Œํ™”ํ•˜์—ฌ ๊ณ„์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ณ  ์ถ”๋ก  ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

TensorRT ๋ ˆ์ด์–ด ํ“จ์ „

  • ๋™์  Tensor ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ: TensorRT ์ถ”๋ก  ์ค‘ tensor ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ํšจ์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ณ  ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋ฅ ์ด ๋”์šฑ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

  • Automatic Kernel Tuning: TensorRT applies automatic kernel tuning to select the most optimized GPU kernel for each layer of the model. This adaptive approach ensures that the model takes full advantage of the GPUs computational power.

๋ฐฐํฌ ์˜ต์…˜ TensorRT

Before we look at the code for exporting YOLO11 models to the TensorRT format, let's understand where TensorRT models are normally used.

TensorRT ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฐํฌ ์˜ต์…˜์„ ์ œ๊ณตํ•˜๋ฉฐ, ๊ฐ ์˜ต์…˜์€ ํ†ตํ•ฉ ์šฉ์ด์„ฑ, ์„ฑ๋Šฅ ์ตœ์ ํ™”, ์œ ์—ฐ์„ฑ ๊ฐ„์˜ ๊ท ํ˜•์„ ๋‹ค๋ฅด๊ฒŒ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค:

  • TensorFlow ๋‚ด์— ๋ฐฐํฌ: ์ด ๋ฐฉ๋ฒ•์€ TensorRT ์„ TensorFlow ์— ํ†ตํ•ฉํ•˜์—ฌ ์ต์ˆ™ํ•œ TensorFlow ํ™˜๊ฒฝ์—์„œ ์ตœ์ ํ™”๋œ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ง€์›๋˜๋Š” ๋ ˆ์ด์–ด์™€ ์ง€์›๋˜์ง€ ์•Š๋Š” ๋ ˆ์ด์–ด๊ฐ€ ํ˜ผํ•ฉ๋œ ๋ชจ๋ธ์— ์œ ์šฉํ•˜๋ฉฐ, TF-TRT๋Š” ์ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

TensorRT ๊ฐœ์š”

  • ๋…๋ฆฝํ˜• TensorRT ๋Ÿฐํƒ€์ž„ API: ์„ธ๋ถ„ํ™”๋œ ์ œ์–ด ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜์—ฌ ์„ฑ๋Šฅ์ด ์ค‘์š”ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค. ๋” ๋ณต์žกํ•˜์ง€๋งŒ ์ง€์›๋˜์ง€ ์•Š๋Š” ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉ์ž ์ง€์ •์œผ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„: ๋‹ค์–‘ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๋ชจ๋ธ์„ ์ง€์›ํ•˜๋Š” ์˜ต์…˜์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ํด๋ผ์šฐ๋“œ ๋˜๋Š” ์—์ง€ ์ถ”๋ก ์— ์ ํ•ฉํ•˜๋ฉฐ, ๋™์‹œ ๋ชจ๋ธ ์‹คํ–‰ ๋ฐ ๋ชจ๋ธ ๋ถ„์„๊ณผ ๊ฐ™์€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Exporting YOLO11 Models to TensorRT

You can improve execution efficiency and optimize performance by converting YOLO11 models to TensorRT format.

์„ค์น˜

ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๋ ค๋ฉด ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

์„ค์น˜

# Install the required package for YOLO11
pip install ultralytics

์„ค์น˜ ๊ณผ์ •๊ณผ ๊ด€๋ จ๋œ ์ž์„ธํ•œ ์ง€์นจ๊ณผ ๋ชจ๋ฒ” ์‚ฌ๋ก€๋Š” YOLO11 ์„ค์น˜ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. YOLO11 ์— ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๋Š” ๋™์•ˆ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•๊ณผ ํŒ์„ ํ™•์ธํ•˜์„ธ์š”.

์‚ฌ์šฉ๋ฒ•

์‚ฌ์šฉ ์ง€์นจ์„ ์‚ดํŽด๋ณด๊ธฐ ์ „์— Ultralytics ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋‹ค์–‘ํ•œ YOLO11 ๋ชจ๋ธ์„ ํ™•์ธํ•˜์„ธ์š”. ํ”„๋กœ์ ํŠธ ์š”๊ตฌ ์‚ฌํ•ญ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋ฒ•

from ultralytics import YOLO

# Load the YOLO11 model
model = YOLO("yolo11n.pt")

# Export the model to TensorRT format
model.export(format="engine")  # creates 'yolo11n.engine'

# Load the exported TensorRT model
tensorrt_model = YOLO("yolo11n.engine")

# Run inference
results = tensorrt_model("https://ultralytics.com/images/bus.jpg")
# Export a YOLO11n PyTorch model to TensorRT format
yolo export model=yolo11n.pt format=engine  # creates 'yolo11n.engine''

# Run inference with the exported model
yolo predict model=yolo11n.engine source='https://ultralytics.com/images/bus.jpg'

๋‚ด๋ณด๋‚ด๊ธฐ ํ”„๋กœ์„ธ์Šค์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‚ด๋ณด๋‚ด๊ธฐ ๊ด€๋ จ ๋ฌธ์„œ ํŽ˜์ด์ง€(Ultralytics )๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

INT8 ์ •๋Ÿ‰ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ TensorRT ๋‚ด๋ณด๋‚ด๊ธฐ

INT8 ์ •๋ฐ€๋„๋กœ TensorRT ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Ultralytics YOLO ๋ชจ๋ธ์„ ๋‚ด๋ณด๋‚ด๋ฉด ํ•™์Šต ํ›„ ์ •๋Ÿ‰ํ™”(PTQ)๊ฐ€ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. TensorRT ๋Š” PTQ์— ๋Œ€ํ•œ ๋ณด์ •์„ ์‚ฌ์šฉํ•˜์—ฌ YOLO ๋ชจ๋ธ์ด ๋Œ€ํ‘œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ถ”๋ก ์„ ์ฒ˜๋ฆฌํ•  ๋•Œ ๊ฐ ํ™œ์„ฑํ™” tensor ๋‚ด์˜ ํ™œ์„ฑํ™” ๋ถ„ํฌ๋ฅผ ์ธก์ •ํ•œ ๋‹ค์Œ ์ด ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ tensor ์— ๋Œ€ํ•œ ์ฒ™๋„ ๊ฐ’์„ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค. ์ •๋Ÿ‰ํ™” ํ›„๋ณด์ธ ๊ฐ ํ™œ์„ฑํ™” tensor ์—๋Š” ๋ณด์ • ํ”„๋กœ์„ธ์Šค๋ฅผ ํ†ตํ•ด ์ถ”๋ก ๋˜๋Š” ๊ด€๋ จ ์ฒ™๋„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์•”์‹œ์ ์œผ๋กœ ์–‘์žํ™”๋œ ๋„คํŠธ์›Œํฌ๋ฅผ ์ฒ˜๋ฆฌํ•  ๋•Œ TensorRT ์€ ๊ณ„์ธต ์‹คํ–‰ ์‹œ๊ฐ„์„ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐํšŒ์ฃผ์˜์ ์œผ๋กœ INT8์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ ˆ์ด์–ด๊ฐ€ INT8์—์„œ ๋” ๋น ๋ฅด๊ฒŒ ์‹คํ–‰๋˜๊ณ  ๋ฐ์ดํ„ฐ ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ์— ์–‘์žํ™” ์Šค์ผ€์ผ์ด ํ• ๋‹น๋œ ๊ฒฝ์šฐ, INT8 ์ •๋ฐ€๋„๋ฅผ ๊ฐ€์ง„ ์ปค๋„์ด ํ•ด๋‹น ๋ ˆ์ด์–ด์— ํ• ๋‹น๋˜๊ณ , ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ TensorRT ๋Š” ํ•ด๋‹น ๋ ˆ์ด์–ด์˜ ์‹คํ–‰ ์‹œ๊ฐ„์ด ๋” ๋น ๋ฅธ ๊ฒƒ์„ ๊ธฐ์ค€์œผ๋กœ ์ปค๋„์— ๋Œ€ํ•ด FP32 ๋˜๋Š” FP16์˜ ์ •๋ฐ€๋„๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

ํŒ

์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ๊ฒฐ๊ณผ๋Š” ๊ธฐ๊ธฐ๋งˆ๋‹ค ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ฐฐํฌ์— TensorRT ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•  ๋™์ผํ•œ ๊ธฐ๊ธฐ๋ฅผ INT8 ์ •๋ฐ€๋„๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๋ฐ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

INT8 ๋‚ด๋ณด๋‚ด๊ธฐ ๊ตฌ์„ฑ

์‚ฌ์šฉ ์‹œ ์ œ๊ณต๋˜๋Š” ์ธ์ˆ˜ ๋‚ด๋ณด๋‚ด๊ธฐ Ultralytics YOLO ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ํฌ๊ฒŒ ๋Š” ๋‚ด๋ณด๋‚ธ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ์ค๋‹ˆ๋‹ค. ๋˜ํ•œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์žฅ์น˜ ๋ฆฌ์†Œ์Šค์— ๋”ฐ๋ผ ์„ ํƒํ•ด์•ผ ํ•˜์ง€๋งŒ ๊ธฐ๋ณธ ์ธ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. should ๋Œ€๋ถ€๋ถ„์˜ ์ž‘์—… Ampere(๋˜๋Š” ์ตœ์‹  ๋ฒ„์ „) NVIDIA ์™ธ์žฅํ˜• GPU. ์‚ฌ์šฉ๋˜๋Š” ๋ณด์ • ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ "ENTROPY_CALIBRATION_2" ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์˜ต์…˜์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. TensorRT ๊ฐœ๋ฐœ์ž ๊ฐ€์ด๋“œ์—์„œ. Ultralytics ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ "ENTROPY_CALIBRATION_2" ๊ฐ€ ์ตœ์„ ์˜ ์„ ํƒ์ด์—ˆ์œผ๋ฉฐ ๋‚ด๋ณด๋‚ด๊ธฐ๋Š” ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋„๋ก ๊ณ ์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • workspace : ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ณ€ํ™˜ํ•˜๋Š” ๋™์•ˆ ๋””๋ฐ”์ด์Šค ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์˜ ํฌ๊ธฐ(GiB ๋‹จ์œ„)๋ฅผ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค.

    • ์กฐ์ • workspace ๊ฐ’์— ๋”ฐ๋ผ ๋ณด์ • ์š”๊ตฌ ์‚ฌํ•ญ๊ณผ ๋ฆฌ์†Œ์Šค ๊ฐ€์šฉ์„ฑ์— ๋”ฐ๋ผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋” ํฐ workspace ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณด์ • ์‹œ๊ฐ„์ด ๋Š˜์–ด๋‚  ์ˆ˜ ์žˆ์ง€๋งŒ TensorRT ์—์„œ ๋” ๋„“์€ ๋ฒ”์œ„์˜ ์ตœ์ ํ™” ์ „๋žต์„ ํƒ์ƒ‰ํ•˜์—ฌ ์ž ์žฌ์ ์œผ๋กœ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ •ํ™•์„ฑ. ๋ฐ˜๋Œ€๋กœ, ๋” ์ž‘์€ workspace ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณด์ • ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ์ง€๋งŒ ์ตœ์ ํ™” ์ „๋žต์ด ์ œํ•œ๋˜์–ด ์ •๋Ÿ‰ํ™”๋œ ๋ชจ๋ธ์˜ ํ’ˆ์งˆ์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • ๊ธฐ๋ณธ๊ฐ’์€ workspace=None๋ฅผ ์„ค์ •ํ•˜๋ฉด TensorRT ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ž๋™์œผ๋กœ ํ• ๋‹นํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ˆ˜๋™์œผ๋กœ ๊ตฌ์„ฑํ•  ๊ฒฝ์šฐ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์ด ์ถฉ๋Œ(๊ฒฝ๊ณ  ์—†์ด ์ข…๋ฃŒ)ํ•˜๋Š” ๊ฒฝ์šฐ ์ด ๊ฐ’์„ ๋Š˜๋ ค์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • TensorRT ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค UNSUPPORTED_STATE ์˜ ๊ฐ’์„ ๋‚ด๋ณด๋‚ด๋Š” ๋™์•ˆ workspace ์˜ ๊ฐ’์ด ์žฅ์น˜์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋ชจ๋ฆฌ๋ณด๋‹ค ํฌ๋ฉด workspace ๋ฅผ ๋‚ฎ์ถ”๊ฑฐ๋‚˜ None.

    • ๋งŒ์•ฝ workspace ๊ฐ€ ์ตœ๋Œ€๊ฐ’์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ๊ณ  ๋ณด์ •์ด ์‹คํŒจํ•˜๊ฑฐ๋‚˜ ์ถฉ๋Œํ•˜๋Š” ๊ฒฝ์šฐ, ๋‹ค์Œ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. None ์˜ ๊ฐ’์„ ์ค„์ด๊ฑฐ๋‚˜ ์ž๋™ ํ• ๋‹น์„ ์œ„ํ•ด imgsz ๊ทธ๋ฆฌ๊ณ  batch ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • INT8์— ๋Œ€ํ•œ ๋ณด์ •์€ ๊ฐ ๋””๋ฐ”์ด์Šค์— ๋”ฐ๋ผ ๋‹ค๋ฅด๋ฉฐ, ๋ณด์ •์„ ์œ„ํ•ด '๊ณ ๊ธ‰' GPU ์„ ๋นŒ๋ ค์„œ ์‚ฌ์šฉํ•˜๋ฉด ๋‹ค๋ฅธ ๋””๋ฐ”์ด์Šค์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•  ๋•Œ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ๊ธฐ์–ตํ•˜์„ธ์š”.

  • batch : ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ตœ๋Œ€ ๋ฐฐ์น˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ์ถ”๋ก  ์ค‘์—๋Š” ๋” ์ž‘์€ ๋ฐฐ์น˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ถ”๋ก ์€ ์ง€์ •๋œ ๊ฒƒ๋ณด๋‹ค ํฐ ๋ฐฐ์น˜๋Š” ํ—ˆ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ฐธ๊ณ 

๋ณด์ •ํ•˜๋Š” ๋™์•ˆ ๋‘ ๋ฐฐ์˜ batch ํฌ๊ธฐ๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ž‘์€ ๋ฐฐ์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณด์ • ์ค‘์— ์Šค์ผ€์ผ๋ง์ด ๋ถ€์ •ํ™•ํ•ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋ณด์ด๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์กฐ์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ž‘์€ ๋ฐฐ์น˜๋Š” ์ „์ฒด ๋ฒ”์œ„์˜ ๊ฐ’์„ ์บก์ฒ˜ํ•˜์ง€ ๋ชปํ•˜์—ฌ ์ตœ์ข… ๋ณด์ •์— ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. batch ํฌ๊ธฐ๊ฐ€ ์ž๋™์œผ๋กœ ๋‘ ๋ฐฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์—†๋Š” ๊ฒฝ์šฐ ๋ฐฐ์น˜ ํฌ๊ธฐ ๊ฐ€ ์ง€์ •๋ฉ๋‹ˆ๋‹ค. batch=1์—์„œ ๋ณด์ •์ด ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. batch=1 * 2 ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณด์ • ์Šค์ผ€์ผ๋ง ์˜ค๋ฅ˜๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

NVIDIA ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ, INT8 ์ •๋Ÿ‰ํ™” ๋ณด์ •์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€ํ‘œํ•˜๋Š” 500๊ฐœ ์ด์ƒ์˜ ๋ณด์ • ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์„ ๊ถŒ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ€์ด๋“œ๋ผ์ธ์ผ ๋ฟ hard ์š”๊ตฌ ์‚ฌํ•ญ ๋ฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์„ฑ๋Šฅ์„ ์ž˜ ๋ฐœํœ˜ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ฒƒ์ด ๋ฌด์—‡์ธ์ง€ ์‹คํ—˜ํ•ด ๋ณด์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. INT8 ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์—๋Š” TensorRT ์„ ์‚ฌ์šฉํ•œ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋ฏ€๋กœ ๋ฐ˜๋“œ์‹œ data ์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ int8=True TensorRT ๋ฅผ ํด๋ฆญํ•˜๊ณ  data="my_dataset.yaml"์˜ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณด์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ’์ด ์ „๋‹ฌ๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ data ๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๊ฒฝ์šฐ INT8 ์ •๋Ÿ‰ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ TensorRT ๋กœ ๋‚ด๋ณด๋‚ด๋ฉด ๊ธฐ๋ณธ๊ฐ’์€ ๋‹ค์Œ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ ์ž‘์—…์— ๊ธฐ๋ฐ˜ํ•œ "์ž‘์€" ์˜ˆ์ œ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ

from ultralytics import YOLO

model = YOLO("yolov8n.pt")
model.export(
    format="engine",
    dynamic=True,  # (1)!
    batch=8,  # (2)!
    workspace=4,  # (3)!
    int8=True,
    data="coco.yaml",  # (4)!
)

# Load the exported TensorRT INT8 model
model = YOLO("yolov8n.engine", task="detect")

# Run inference
result = model.predict("https://ultralytics.com/images/bus.jpg")
  1. ๋™์  ์ถ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‚ด๋ณด๋‚ผ ๋•Œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด ์˜ต์…˜์ด ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. int8=True ๋ช…์‹œ์ ์œผ๋กœ ์„ค์ •ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋„ ๋งˆ์ฐฌ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค. ์ฐธ์กฐ ๋‚ด๋ณด๋‚ด๊ธฐ ์ธ์ˆ˜ ๋ฅผ ํด๋ฆญํ•ด ์ž์„ธํ•œ ์ •๋ณด๋ฅผ ํ™•์ธํ•˜์„ธ์š”.
  2. ๋‚ด๋ณด๋‚ธ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ตœ๋Œ€ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ 8๋กœ ์„ค์ •ํ•˜๊ณ , ๋‹ค์Œ์œผ๋กœ ๋ณด์ •ํ•ฉ๋‹ˆ๋‹ค. batch = 2 * 8 ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณด์ • ์ค‘ ์Šค์ผ€์ผ๋ง ์˜ค๋ฅ˜๋ฅผ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  3. ๋ณ€ํ™˜ ํ”„๋กœ์„ธ์Šค๋ฅผ ์œ„ํ•ด ์ „์ฒด ๋””๋ฐ”์ด์Šค๋ฅผ ํ• ๋‹นํ•˜๋Š” ๋Œ€์‹  4๊ธฐ๊ฐ€๋ฐ”์ดํŠธ์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค.
  4. ๋ณด์ •, ํŠนํžˆ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ์— ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฏธ์ง€(์ด 5,000๊ฐœ)์— COCO ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
# Export a YOLO11n PyTorch model to TensorRT format with INT8 quantization
yolo export model=yolo11n.pt format=engine batch=8 workspace=4 int8=True data=coco.yaml  # creates 'yolov8n.engine''

# Run inference with the exported TensorRT quantized model
yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'
๋ณด์ • ์บ์‹œ

TensorRT ์€ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. .cache ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์—ฌ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ–ฅํ›„ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ๋‚ด๋ณด๋‚ด๊ธฐ ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌ๊ฒŒ ๋‹ค๋ฅด๊ฑฐ๋‚˜ batch ๊ฐ’์ด ํฌ๊ฒŒ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ๋Š” ๊ธฐ์กด .cache ์˜ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜์—ฌ ๋‹ค๋ฅธ ๋””๋ ‰ํ† ๋ฆฌ๋กœ ์˜ฎ๊ธฐ๊ฑฐ๋‚˜ ์™„์ „ํžˆ ์‚ญ์ œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

TensorRT INT8๊ณผ ํ•จ๊ป˜ YOLO ์‚ฌ์šฉ์˜ ์žฅ์ 

  • ๋ชจ๋ธ ํฌ๊ธฐ ๊ฐ์†Œ: FP32์—์„œ INT8๋กœ ์–‘์žํ™”ํ•˜๋ฉด ๋ชจ๋ธ ํฌ๊ธฐ๊ฐ€ 4๋ฐฐ(๋””์Šคํฌ ๋˜๋Š” ๋ฉ”๋ชจ๋ฆฌ)๋กœ ์ค„์–ด๋“ค์–ด ๋‹ค์šด๋กœ๋“œ ์‹œ๊ฐ„์ด ๋‹จ์ถ•๋˜๊ณ , ์Šคํ† ๋ฆฌ์ง€ ์š”๊ตฌ ์‚ฌํ•ญ์ด ๋‚ฎ์•„์ง€๋ฉฐ, ๋ชจ๋ธ ๋ฐฐํฌ ์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ „๋ ฅ ์†Œ๋น„ ๊ฐ์†Œ: INT8 ์ˆ˜์ถœ์šฉ YOLO ๋ชจ๋ธ์˜ ์ •๋ฐ€ ์—ฐ์‚ฐ ๊ฐ์†Œ๋Š” ํŠนํžˆ ๋ฐฐํ„ฐ๋ฆฌ๋กœ ๊ตฌ๋™๋˜๋Š” ์žฅ์น˜์˜ ๊ฒฝ์šฐ FP32 ๋ชจ๋ธ์— ๋น„ํ•ด ์ „๋ ฅ ์†Œ๋น„๊ฐ€ ์ ์Šต๋‹ˆ๋‹ค.

  • ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ: TensorRT ์€ ๋Œ€์ƒ ํ•˜๋“œ์›จ์–ด์— ๋งž๊ฒŒ ๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜์—ฌ ์ž ์žฌ์ ์œผ๋กœ GPU, ์ž„๋ฒ ๋””๋“œ ์žฅ์น˜ ๋ฐ ๊ฐ€์†๊ธฐ์—์„œ ๋” ๋น ๋ฅธ ์ถ”๋ก  ์†๋„๋ฅผ ์ด๋Œ์–ด๋ƒ…๋‹ˆ๋‹ค.

์ถ”๋ก  ์†๋„์— ๋Œ€ํ•œ ์ฐธ๊ณ  ์‚ฌํ•ญ

TensorRT INT8๋กœ ๋‚ด๋ณด๋‚ธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ์ฒ˜์Œ ๋ช‡ ๋ฒˆ์˜ ์ถ”๋ก  ํ˜ธ์ถœ์€ ์ผ๋ฐ˜์ ์ธ ์ „์ฒ˜๋ฆฌ, ์ถ”๋ก  ๋ฐ/๋˜๋Š” ํ›„์ฒ˜๋ฆฌ ์‹œ๊ฐ„๋ณด๋‹ค ๋” ์˜ค๋ž˜ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์Œ์„ ๋ณ€๊ฒฝํ•  ๋•Œ๋„ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. imgsz ์ถ”๋ก ํ•˜๋Š” ๋™์•ˆ, ํŠนํžˆ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฝ์šฐ imgsz ๊ฐ€ ๋‚ด๋ณด๋‚ด๊ธฐ ์‹œ ์ง€์ •ํ•œ ๊ฒƒ๊ณผ ๋™์ผํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ(๋‚ด๋ณด๋‚ด๊ธฐ imgsz TensorRT "์ตœ์ " ํ”„๋กœํ•„๋กœ ์„ค์ •๋จ).

TensorRT INT8๊ณผ YOLO ์‚ฌ์šฉ์˜ ๋‹จ์ 

  • ํ‰๊ฐ€ ์ง€ํ‘œ๊ฐ€ ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค: ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด mAP, Precision, Recall ๋˜๋Š” ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐํƒ€ ๋ฉ”ํŠธ๋ฆญ ๋Š” ๋‹ค์†Œ ์•…ํ™”๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ ์„น์…˜ ์˜ ์ฐจ์ด์ ์„ ๋น„๊ตํ•˜๋ ค๋ฉด mAP50 ๊ทธ๋ฆฌ๊ณ  mAP50-95 ๋‹ค์–‘ํ•œ ๊ธฐ๊ธฐ์˜ ์ž‘์€ ์ƒ˜ํ”Œ์—์„œ INT8๋กœ ๋‚ด๋ณด๋‚ผ ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • ๊ฐœ๋ฐœ ์‹œ๊ฐ„ ์ฆ๊ฐ€: ๋ฐ์ดํ„ฐ ์„ธํŠธ์™€ ๋””๋ฐ”์ด์Šค์— ๋Œ€ํ•œ INT8 ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์˜ '์ตœ์ ' ์„ค์ •์„ ์ฐพ์œผ๋ ค๋ฉด ์ƒ๋‹นํ•œ ์–‘์˜ ํ…Œ์ŠคํŠธ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ํ•˜๋“œ์›จ์–ด ์ข…์†์„ฑ: ๋ณด์ • ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ํ•˜๋“œ์›จ์–ด์— ๋”ฐ๋ผ ํฌ๊ฒŒ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋Š” ์ด์ „ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

Ultralytics YOLO TensorRT ์ˆ˜์ถœ ์‹ค์ 

NVIDIA A100

์„ฑ๋Šฅ

์šฐ๋ถ„ํˆฌ 22.04.3 LTS์—์„œ ํ…Œ์ŠคํŠธํ–ˆ์Šต๋‹ˆ๋‹ค, python 3.10.12, ultralytics==8.2.4, tensorrt==8.6.1.post1

์‚ฌ์ „ ํ•™์Šต๋œ 80๊ฐœ์˜ ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜์—ฌ COCO์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ์ œ๋Š” ํƒ์ง€ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.52 0.51 | 0.56 8 640
FP32 COCOval 0.52 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 0.34 0.34 | 0.41 8 640
FP16 COCOval 0.33 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 0.28 0.27 | 0.31 8 640
INT8 COCOval 0.29 0.47 0.33 1 640

์„ธ๋ถ„ํ™” ๋ฌธ์„œ์—์„œ 80๊ฐœ์˜ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜์—ฌ COCO์—์„œ ํ›ˆ๋ จ๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ์‹œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n-seg.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
mAPval
50(M)
mAPval
50-95(M)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.62 0.61 | 0.68 8 640
FP32 COCOval 0.63 0.52 0.36 0.49 0.31 1 640
FP16 ์˜ˆ์ธก 0.40 0.39 | 0.44 8 640
FP16 COCOval 0.43 0.52 0.36 0.49 0.30 1 640
INT8 ์˜ˆ์ธก 0.34 0.33 | 0.37 8 640
INT8 COCOval 0.36 0.46 0.32 0.43 0.27 1 640

1000๊ฐœ์˜ ์‚ฌ์ „ ํ•™์Šต๋œ ํด๋ž˜์Šค๊ฐ€ ํฌํ•จ๋œ ImageNet์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ๋Š” ๋ถ„๋ฅ˜ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n-cls.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
top-1 ์ƒ์œ„ 5์œ„ batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.26 0.25 | 0.28 8 640
FP32 ์ด๋ฏธ์ง€๋„ท๋ฐธ 0.26 0.35 0.61 1 640
FP16 ์˜ˆ์ธก 0.18 0.17 | 0.19 8 640
FP16 ์ด๋ฏธ์ง€๋„ท๋ฐธ 0.18 0.35 0.61 1 640
INT8 ์˜ˆ์ธก 0.16 0.15 | 0.57 8 640
INT8 ์ด๋ฏธ์ง€๋„ท๋ฐธ 0.15 0.32 0.59 1 640

์‚ฌ์ „ ํ•™์Šต๋œ ํด๋ž˜์Šค์ธ '์‚ฌ๋žŒ' 1๊ฐœ๋ฅผ ํฌํ•จํ•˜์—ฌ COCO์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ๋Š” ํฌ์ฆˆ ์ถ”์ • ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n-pose.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
mAPval
50(P)
mAPval
50-95(P)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.54 0.53 | 0.58 8 640
FP32 COCOval 0.55 0.91 0.69 0.80 0.51 1 640
FP16 ์˜ˆ์ธก 0.37 0.35 | 0.41 8 640
FP16 COCOval 0.36 0.91 0.69 0.80 0.51 1 640
INT8 ์˜ˆ์ธก 0.29 0.28 | 0.33 8 640
INT8 COCOval 0.30 0.90 0.68 0.78 0.47 1 640

์‚ฌ์ „ ํ•™์Šต๋œ 15๊ฐœ์˜ ํด๋ž˜์Šค๊ฐ€ ํฌํ•จ๋œ DOTAv1์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ์˜ˆ์ œ๋Š” ์ง€ํ–ฅ ํƒ์ง€ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n-obb.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 0.52 0.51 | 0.59 8 640
FP32 DOTAv1val 0.76 0.50 0.36 1 640
FP16 ์˜ˆ์ธก 0.34 0.33 | 0.42 8 640
FP16 DOTAv1val 0.59 0.50 0.36 1 640
INT8 ์˜ˆ์ธก 0.29 0.28 | 0.33 8 640
INT8 DOTAv1val 0.32 0.45 0.32 1 640

์†Œ๋น„์ž์šฉ GPU

ํƒ์ง€ ์„ฑ๋Šฅ(COCO)

Windows 10.0.19045์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค, python 3.10.9, ultralytics==8.2.4, tensorrt==10.0.0b6

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 1.06 0.75 | 1.88 8 640
FP32 COCOval 1.37 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 0.62 0.75 | 1.13 8 640
FP16 COCOval 0.85 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 0.52 0.38 | 1.00 8 640
INT8 COCOval 0.74 0.47 0.33 1 640

Windows 10.0.22631์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค, python 3.11.9, ultralytics==8.2.4, tensorrt==10.0.1

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 1.76 1.69 | 1.87 8 640
FP32 COCOval 1.94 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 0.86 0.75 | 1.00 8 640
FP16 COCOval 1.43 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 0.80 0.75 | 1.00 8 640
INT8 COCOval 1.35 0.47 0.33 1 640

Pop!_OS 22.04 LTS์—์„œ ํ…Œ์ŠคํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค, python 3.10.12, ultralytics==8.2.4, tensorrt==8.6.1.post1

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 2.84 2.84 | 2.85 8 640
FP32 COCOval 2.94 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 1.09 1.09 | 1.10 8 640
FP16 COCOval 1.20 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 0.75 0.74 | 0.75 8 640
INT8 COCOval 0.76 0.47 0.33 1 640

์ž„๋ฒ ๋””๋“œ ๋””๋ฐ”์ด์Šค

ํƒ์ง€ ์„ฑ๋Šฅ(COCO)

JetPack 6.0(L4T 36.3) ์šฐ๋ถ„ํˆฌ 22.04.4 LTS๋กœ ํ…Œ์ŠคํŠธํ–ˆ์Šต๋‹ˆ๋‹ค, python 3.10.12, ultralytics==8.2.16, tensorrt==10.0.1

์ฐธ๊ณ 

๋‹ค์Œ์— ๋Œ€ํ•œ ์ถ”๋ก  ์‹œ๊ฐ„์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. mean, min (๊ฐ€์žฅ ๋น ๋ฆ„), ๊ทธ๋ฆฌ๊ณ  max (๊ฐ€์žฅ ๋Š๋ฆฐ) ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํ…Œ์ŠคํŠธ์— ๋Œ€ํ•ด yolov8n.engine

์ •๋ฐ€๋„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ํ‰๊ท 
(ms)
์ตœ์†Œ | ์ตœ๋Œ€
(ms)
mAPval
50(B)
mAPval
50-95(B)
batch ํฌ๊ธฐ
(ํ”ฝ์…€)
FP32 ์˜ˆ์ธก 6.11 6.10 | 6.29 8 640
FP32 COCOval 6.17 0.52 0.37 1 640
FP16 ์˜ˆ์ธก 3.18 3.18 | 3.20 8 640
FP16 COCOval 3.19 0.52 0.37 1 640
INT8 ์˜ˆ์ธก 2.30 2.29 | 2.35 8 640
INT8 COCOval 2.32 0.46 0.32 1 640

์ •๋ณด

์„ค์ • ๋ฐ ๊ตฌ์„ฑ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด Ultralytics YOLO ์—์„œ ๋น ๋ฅธ ์‹œ์ž‘ ๊ฐ€์ด๋“œ( NVIDIA Jetson)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํ‰๊ฐ€ ๋ฐฉ๋ฒ•

์•„๋ž˜ ์„น์…˜์„ ํ™•์žฅํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ๋‚ด๋ณด๋‚ด๊ณ  ํ…Œ์ŠคํŠธํ•œ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

๊ตฌ์„ฑ ๋‚ด๋ณด๋‚ด๊ธฐ

๋‚ด๋ณด๋‚ด๊ธฐ ๊ตฌ์„ฑ ์ธ์ˆ˜์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‚ด๋ณด๋‚ด๊ธฐ ๋ชจ๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# TensorRT FP32
out = model.export(format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2)

# TensorRT FP16
out = model.export(format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2, half=True)

# TensorRT INT8 with calibration `data` (i.e. COCO, ImageNet, or DOTAv1 for appropriate model task)
out = model.export(
    format="engine", imgsz=640, dynamic=True, verbose=False, batch=8, workspace=2, int8=True, data="coco8.yaml"
)
์˜ˆ์ธก ๋ฃจํ”„

์ž์„ธํ•œ ๋‚ด์šฉ์€ ์˜ˆ์ธก ๋ชจ๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

import cv2

from ultralytics import YOLO

model = YOLO("yolov8n.engine")
img = cv2.imread("path/to/image.jpg")

for _ in range(100):
    result = model.predict(
        [img] * 8,  # batch=8 of the same image
        verbose=False,
        device="cuda",
    )
์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๊ตฌ์„ฑ

์ฐธ์กฐ val ๋ชจ๋“œ ๋ฅผ ํด๋ฆญํ•˜์—ฌ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๊ตฌ์„ฑ ์ธ์ˆ˜์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”.

from ultralytics import YOLO

model = YOLO("yolov8n.engine")
results = model.val(
    data="data.yaml",  # COCO, ImageNet, or DOTAv1 for appropriate model task
    batch=1,
    imgsz=640,
    verbose=False,
    device="cuda",
)

Deploying Exported YOLO11 TensorRT Models

Having successfully exported your Ultralytics YOLO11 models to TensorRT format, you're now ready to deploy them. For in-depth instructions on deploying your TensorRT models in various settings, take a look at the following resources:

์š”์•ฝ

In this guide, we focused on converting Ultralytics YOLO11 models to NVIDIA's TensorRT model format. This conversion step is crucial for improving the efficiency and speed of YOLO11 models, making them more effective and suitable for diverse deployment environments.

์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ TensorRT ๊ณต์‹ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

If you're curious about additional Ultralytics YOLO11 integrations, our integration guide page provides an extensive selection of informative resources and insights.

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

How do I convert YOLO11 models to TensorRT format?

To convert your Ultralytics YOLO11 models to TensorRT format for optimized NVIDIA GPU inference, follow these steps:

  1. ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค:

    pip install ultralytics
    
  2. Export your YOLO11 model:

    from ultralytics import YOLO
    
    model = YOLO("yolo11n.pt")
    model.export(format="engine")  # creates 'yolov8n.engine'
    
    # Run inference
    model = YOLO("yolo11n.engine")
    results = model("https://ultralytics.com/images/bus.jpg")
    

For more details, visit the YOLO11 Installation guide and the export documentation.

What are the benefits of using TensorRT for YOLO11 models?

Using TensorRT to optimize YOLO11 models offers several benefits:

  • ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ: TensorRT ๋ชจ๋ธ ๋ ˆ์ด์–ด๋ฅผ ์ตœ์ ํ™”ํ•˜๊ณ  ์ •๋ฐ€ ๋ณด์ •(INT8 ๋ฐ FP16)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š์œผ๋ฉด์„œ ์ถ”๋ก  ์†๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.
  • ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ: TensorRT ์€ tensor ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋™์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜์—ฌ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ณ  GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋ฅ ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ ˆ์ด์–ด ํ“จ์ „: ์—ฌ๋Ÿฌ ๋ ˆ์ด์–ด๋ฅผ ๋‹จ์ผ ์ž‘์—…์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์„ ์ค„์ž…๋‹ˆ๋‹ค.
  • ์ปค๋„ ์ž๋™ ํŠœ๋‹: ๊ฐ ๋ชจ๋ธ ๋ ˆ์ด์–ด์— ์ตœ์ ํ™”๋œ GPU ์ปค๋„์„ ์ž๋™์œผ๋กœ ์„ ํƒํ•˜์—ฌ ์ตœ๋Œ€ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

์ž์„ธํ•œ ๋‚ด์šฉ์€ TensorRT ์—์„œ ์ž์„ธํ•œ ๊ธฐ๋Šฅ์„ ์‚ดํŽด๋ณด๊ณ  TensorRT ๊ฐœ์š” ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

Can I use INT8 quantization with TensorRT for YOLO11 models?

Yes, you can export YOLO11 models using TensorRT with INT8 quantization. This process involves post-training quantization (PTQ) and calibration:

  1. INT8๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ:

    from ultralytics import YOLO
    
    model = YOLO("yolov8n.pt")
    model.export(format="engine", batch=8, workspace=4, int8=True, data="coco.yaml")
    
  2. ์ถ”๋ก ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

    from ultralytics import YOLO
    
    model = YOLO("yolov8n.engine", task="detect")
    result = model.predict("https://ultralytics.com/images/bus.jpg")
    

์ž์„ธํ•œ ๋‚ด์šฉ์€ INT8 ์ •๋Ÿ‰ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ TensorRT ๋‚ด๋ณด๋‚ด๊ธฐ ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

How do I deploy YOLO11 TensorRT models on an NVIDIA Triton Inference Server?

Deploying YOLO11 TensorRT models on an NVIDIA Triton Inference Server can be done using the following resources:

์ด ๊ฐ€์ด๋“œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐฐํฌ ํ™˜๊ฒฝ์—์„œ YOLOv8 ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ํ†ตํ•ฉํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

TensorRT ์œผ๋กœ ๋‚ด๋ณด๋‚ธ YOLOv8 ๋ชจ๋ธ์—์„œ๋Š” ์–ด๋–ค ์„ฑ๋Šฅ ๊ฐœ์„ ์ด ๊ด€์ฐฐ๋˜๋‚˜์š”?

TensorRT ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ์‚ฌ์šฉํ•˜๋Š” ํ•˜๋“œ์›จ์–ด์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ๋ช‡ ๊ฐ€์ง€ ์ผ๋ฐ˜์ ์ธ ๋ฒค์น˜๋งˆํฌ์ž…๋‹ˆ๋‹ค:

  • NVIDIA A100:

    • FP32 ์ถ”๋ก : ~0.52ms/์ด๋ฏธ์ง€
    • FP16 ์ถ”๋ก : ~0.34ms/์ด๋ฏธ์ง€
    • INT8 ์ถ”๋ก : ~0.28ms/์ด๋ฏธ์ง€
    • INT8 ์ •๋ฐ€๋„๋กœ ๋งต์€ ์•ฝ๊ฐ„ ๊ฐ์†Œํ–ˆ์ง€๋งŒ ์†๋„๋Š” ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ์†Œ๋น„์ž์šฉ GPU(์˜ˆ: RTX 3080):

    • FP32 ์ถ”๋ก : ~1.06ms/์ด๋ฏธ์ง€
    • FP16 ์ถ”๋ก : ~0.62ms/์ด๋ฏธ์ง€
    • INT8 ์ถ”๋ก : ~0.52ms/์ด๋ฏธ์ง€

๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ๋Š” ์„ฑ๋Šฅ ์„น์…˜์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

TensorRT ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ๋ณด๋‹ค ํฌ๊ด„์ ์ธ ์ธ์‚ฌ์ดํŠธ๋Š” Ultralytics ๋ฌธ์„œ์™€ ์„ฑ๋Šฅ ๋ถ„์„ ๋ณด๊ณ ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

11๊ฐœ์›” ์ „ ์ƒ์„ฑ๋จ โœ๏ธ 10 ์ผ ์ „ ์—…๋ฐ์ดํŠธ ๋จ

๋Œ“๊ธ€