์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

Optimizing YOLOv8 Inferences with Neural Magic's DeepSparse Engine

When deploying object detection models like Ultralytics YOLOv8 on various hardware, you can bump into unique issues like optimization. This is where YOLOv8's integration with Neural Magic's DeepSparse Engine steps in. It transforms the way YOLOv8 models are executed and enables GPU-level performance directly on CPUs.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” Neural Magic ์˜ DeepSparse๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ YOLOv8 ๋ฅผ ๋ฐฐํฌํ•˜๋Š” ๋ฐฉ๋ฒ•, ์ถ”๋ก ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•, ์„ฑ๋Šฅ์„ ๋ฒค์น˜๋งˆํ‚นํ•˜์—ฌ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Neural Magic์˜ ๋”ฅ์ŠคํŽ˜์ด์Šค

Neural Magic's DeepSparse Overview

Neural Magic's DeepSparse is an inference run-time designed to optimize the execution of neural networks on CPUs. It applies advanced techniques like sparsity, pruning, and quantization to dramatically reduce computational demands while maintaining accuracy. DeepSparse offers an agile solution for efficient and scalable neural network execution across various devices.

Benefits of Integrating Neural Magic's DeepSparse with YOLOv8

Before diving into how to deploy YOLOV8 using DeepSparse, let's understand the benefits of using DeepSparse. Some key advantages include:

  • ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ: ์ตœ๋Œ€ 525 FPS( YOLOv8n)๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ YOLOv8 ์˜ ์ถ”๋ก  ๊ธฐ๋Šฅ์ด ๊ธฐ์กด ๋ฐฉ์‹์— ๋น„ํ•ด ํฌ๊ฒŒ ๋นจ๋ผ์กŒ์Šต๋‹ˆ๋‹ค.

ํ–ฅ์ƒ๋œ ์ถ”๋ก  ์†๋„

  • ์ตœ์ ํ™”๋œ ๋ชจ๋ธ ํšจ์œจ์„ฑ: ๊ฐ€์ง€ ์น˜๊ธฐ ๋ฐ ์ •๋Ÿ‰ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ YOLOv8 ์˜ ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ์‹œ์ผœ ๋ชจ๋ธ ํฌ๊ธฐ์™€ ๊ณ„์‚ฐ ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ค„์ด๋ฉด์„œ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

์ตœ์ ํ™”๋œ ๋ชจ๋ธ ํšจ์œจ์„ฑ

  • ํ‘œ์ค€ CPU์—์„œ ๊ณ ์„ฑ๋Šฅ: CPU์—์„œ GPU์™€ ๊ฐ™์€ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋ณด๋‹ค ์ ‘๊ทผํ•˜๊ธฐ ์‰ฝ๊ณ  ๋น„์šฉ ํšจ์œจ์ ์ธ ์˜ต์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

  • ๊ฐ„์†Œํ™”๋œ ํ†ตํ•ฉ ๋ฐ ๋ฐฐํฌ: ์ด๋ฏธ์ง€ ๋ฐ ๋™์˜์ƒ ์ฃผ์„ ๊ธฐ๋Šฅ์„ ํฌํ•จํ•˜์—ฌ YOLOv8 ์„ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์‰ฝ๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ์‚ฌ์šฉ์ž ์นœํ™”์ ์ธ ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

  • ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ์œ ํ˜• ์ง€์›: ํ‘œ์ค€ ๋ฐ ํฌ์†Œ์„ฑ์— ์ตœ์ ํ™”๋œ YOLOv8 ๋ชจ๋ธ๊ณผ ๋ชจ๋‘ ํ˜ธํ™˜๋˜๋ฏ€๋กœ ๋ฐฐํฌ ์œ ์—ฐ์„ฑ์ด ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.

  • ๋น„์šฉ ํšจ์œจ์ ์ด๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์†”๋ฃจ์…˜: ์šด์˜ ๋น„์šฉ์„ ์ ˆ๊ฐํ•˜๊ณ  ๊ณ ๊ธ‰ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์˜ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๋ฐฐํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Neural Magic ์˜ ๋”ฅ์ŠคํŽ˜์ด์Šค ๊ธฐ์ˆ ์€ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋‚˜์š”?

Neural Magic's Deep Sparse technology is inspired by the human brain's efficiency in neural network computation. It adopts two key principles from the brain as follows:

  • ํฌ์†Œ์„ฑ: ์ŠคํŒŒ์Šคํ™” ๊ณผ์ •์€ ๋”ฅ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ์—์„œ ์ค‘๋ณต ์ •๋ณด๋ฅผ ์ž˜๋ผ๋‚ด์–ด ์ •ํ™•๋„ ์ €ํ•˜ ์—†์ด ๋” ์ž‘๊ณ  ๋น ๋ฅธ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๊ธฐ์ˆ ์€ ๋„คํŠธ์›Œํฌ์˜ ํฌ๊ธฐ์™€ ๊ณ„์‚ฐ ์š”๊ตฌ ์‚ฌํ•ญ์„ ํฌ๊ฒŒ ์ค„์—ฌ์ค๋‹ˆ๋‹ค.

  • ์ฐธ์กฐ ์œ„์น˜: DeepSparse๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ Tensor ์—ด๋กœ ๋ถ„ํ• ํ•˜๋Š” ๊ณ ์œ ํ•œ ์‹คํ–‰ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์—ด์€ CPU์˜ ์บ์‹œ ๋‚ด์—์„œ ์ „์ ์œผ๋กœ ๊นŠ์ด ๋‹จ์œ„๋กœ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋‘๋‡Œ์˜ ํšจ์œจ์„ฑ์„ ๋ชจ๋ฐฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ด๋™์„ ์ตœ์†Œํ™”ํ•˜๊ณ  CPU์˜ ์บ์‹œ ์‚ฌ์šฉ์„ ์ตœ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

Neural Magic ์˜ ๋”ฅ์ŠคํŽ˜์–ด์Šค ๊ธฐ์ˆ  ์ž‘๋™ ๋ฐฉ์‹

Neural Magic ์˜ ๋”ฅ์ŠคํŽ˜์ด์Šค ๊ธฐ์ˆ  ์ž‘๋™ ๋ฐฉ์‹์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ํ•ด๋‹น ๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๋ฌผ์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์‚ฌ์šฉ์ž ์ง€์ • ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์—์„œ ํ•™์Šต๋œ YOLOv8 ์˜ ์ŠคํŒŒ์Šค ๋ฒ„์ „ ๋งŒ๋“ค๊ธฐ

Neural Magic ์˜ ์˜คํ”ˆ ์†Œ์Šค ๋ชจ๋ธ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์ธ SparseZoo๋Š” ๋ฏธ๋ฆฌ ์ŠคํŒŒ์Šคํ™”๋œ YOLOv8 ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ ๋ชจ์Œ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Ultralytics ์™€ ์›ํ™œํ•˜๊ฒŒ ํ†ตํ•ฉ๋œ SparseML์„ ์‚ฌ์šฉํ•˜๋ฉด ๊ฐ„๋‹จํ•œ ๋ช…๋ น์ค„ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์ด๋Ÿฌํ•œ ์ŠคํŒŒ์Šค ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์†์‰ฝ๊ฒŒ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ž์„ธํ•œ ๋‚ด์šฉ์€ Neural Magic ์˜ SparseML YOLOv8 ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์‚ฌ์šฉ๋ฒ•: DeepSparse๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ YOLOV8 ๋ฐฐํฌ

Neural Magic ์˜ DeepSparse๋กœ YOLOv8 ๋ฅผ ๋ฐฐํฌํ•˜๋ ค๋ฉด ๋ช‡ ๊ฐ€์ง€ ๊ฐ„๋‹จํ•œ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์ณ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ ์ง€์นจ์„ ์‚ดํŽด๋ณด๊ธฐ ์ „์— Ultralytics ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋‹ค์–‘ํ•œ YOLOv8 ๋ชจ๋ธ์„ ํ™•์ธํ•˜์„ธ์š”. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ”„๋กœ์ ํŠธ ์š”๊ตฌ ์‚ฌํ•ญ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. ์‹œ์ž‘ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

1๋‹จ๊ณ„: ์„ค์น˜

ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๋ ค๋ฉด ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

์„ค์น˜

# Install the required packages
pip install deepsparse[yolov8]

2๋‹จ๊ณ„: ONNX ํ˜•์‹์œผ๋กœ YOLOv8 ๋‚ด๋ณด๋‚ด๊ธฐ

DeepSparse ์—”์ง„์—๋Š” ONNX ํ˜•์‹์˜ YOLOv8 ๋ชจ๋ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ์ด ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๊ฒƒ์€ DeepSparse์™€์˜ ํ˜ธํ™˜์„ฑ์„ ์œ„ํ•ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ YOLOv8 ๋ชจ๋ธ์„ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค:

๋ชจ๋ธ ๋‚ด๋ณด๋‚ด๊ธฐ

# Export YOLOv8 model to ONNX format
yolo task=detect mode=export model=yolov8n.pt format=onnx opset=13

์ด ๋ช…๋ น์€ yolov8n.onnx ๋ชจ๋ธ์„ ๋””์Šคํฌ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

3๋‹จ๊ณ„: ์ถ”๋ก  ๋ฐฐํฌ ๋ฐ ์‹คํ–‰

ONNX ํ˜•์‹์˜ YOLOv8 ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ฉด DeepSparse๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก ์„ ๋ฐฐํฌํ•˜๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ž‘์—…์€ ์ง๊ด€์ ์ธ Python API๋กœ ์‰ฝ๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

์ถ”๋ก  ๋ฐฐํฌ ๋ฐ ์‹คํ–‰

from deepsparse import Pipeline

# Specify the path to your YOLOv8 ONNX model
model_path = "path/to/yolov8n.onnx"

# Set up the DeepSparse Pipeline
yolo_pipeline = Pipeline.create(task="yolov8", model_path=model_path)

# Run the model on your images
images = ["path/to/image.jpg"]
pipeline_outputs = yolo_pipeline(images=images)

4๋‹จ๊ณ„: ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํ‚น

YOLOv8 ๋ชจ๋ธ์ด DeepSparse์—์„œ ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋ฒค์น˜๋งˆํ‚นํ•˜์—ฌ ์ฒ˜๋ฆฌ๋Ÿ‰๊ณผ ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

๋ฒค์น˜๋งˆํ‚น

# Benchmark performance
deepsparse.benchmark model_path="path/to/yolov8n.onnx" --scenario=sync --input_shapes="[1,3,640,640]"

5๋‹จ๊ณ„: ์ถ”๊ฐ€ ๊ธฐ๋Šฅ

DeepSparse๋Š” ์ด๋ฏธ์ง€ ์ฃผ์„ ๋ฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ ํ‰๊ฐ€์™€ ๊ฐ™์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ YOLOv8 ์˜ ์‹ค์งˆ์ ์ธ ํ†ตํ•ฉ์„ ์œ„ํ•œ ์ถ”๊ฐ€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ถ”๊ฐ€ ๊ธฐ๋Šฅ

# For image annotation
deepsparse.yolov8.annotate --source "path/to/image.jpg" --model_filepath "path/to/yolov8n.onnx"

# For evaluating model performance on a dataset
deepsparse.yolov8.eval --model_path "path/to/yolov8n.onnx"

์ฃผ์„ ๋‹ฌ๊ธฐ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜๋ฉด ์ง€์ •๋œ ์ด๋ฏธ์ง€๊ฐ€ ์ฒ˜๋ฆฌ๋˜๊ณ , ๊ฐ์ฒด๋ฅผ ๊ฐ์ง€ํ•˜๊ณ , ์ฃผ์„์ด ๋‹ฌ๋ฆฐ ์ด๋ฏธ์ง€๊ฐ€ ๊ฒฝ๊ณ„ ์ƒ์ž ๋ฐ ๋ถ„๋ฅ˜์™€ ํ•จ๊ป˜ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ์ฃผ์„์ด ๋‹ฌ๋ฆฐ ์ด๋ฏธ์ง€๋Š” ์ฃผ์„ ๊ฒฐ๊ณผ ํด๋”์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ชจ๋ธ์˜ ๊ฐ์ง€ ๊ธฐ๋Šฅ์„ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€ ์ฃผ์„ ๊ธฐ๋Šฅ

ํ‰๊ฐ€ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜๋ฉด ์ •ํ™•๋„, ํšŒ์ˆ˜์œจ, ํ‰๊ท  ์ •๋ฐ€๋„(mAP)์™€ ๊ฐ™์€ ์ž์„ธํ•œ ์ถœ๋ ฅ ๋ฉ”ํŠธ๋ฆญ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์€ ํŠน์ • ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋งž๊ฒŒ YOLOv8 ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜์—ฌ ๋†’์€ ์ •ํ™•๋„์™€ ํšจ์œจ์„ฑ์„ ๋ณด์žฅํ•˜๋Š” ๋ฐ ํŠนํžˆ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝ

This guide explored integrating Ultralytics' YOLOv8 with Neural Magic's DeepSparse Engine. It highlighted how this integration enhances YOLOv8's performance on CPU platforms, offering GPU-level efficiency and advanced neural network sparsity techniques.

For more detailed information and advanced usage, visit Neural Magic's DeepSparse documentation. Also, check out Neural Magic's documentation on the integration with YOLOv8 here and watch a great session on it here.

๋˜ํ•œ ๋‹ค์–‘ํ•œ YOLOv8 ํ†ตํ•ฉ์— ๋Œ€ํ•œ ๋ณด๋‹ค ํญ๋„“์€ ์ดํ•ด๋ฅผ ์œ„ํ•ด Ultralytics ํ†ตํ•ฉ ๊ฐ€์ด๋“œ ํŽ˜์ด์ง€๋ฅผ ๋ฐฉ๋ฌธํ•˜๋ฉด ๋‹ค์–‘ํ•˜๊ณ  ํฅ๋ฏธ๋กœ์šด ํ†ตํ•ฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.



Created 2023-12-30, Updated 2024-06-02
Authors: glenn-jocher (6), abirami-vina (1)

๋Œ“๊ธ€