์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

Triton ์ถ”๋ก  ์„œ๋ฒ„ Ultralytics YOLOv8

Triton ์ถ”๋ก  ์„œ๋ฒ„ (์ด์ „ ๋ช…์นญ: TensorRT ์ถ”๋ก  ์„œ๋ฒ„)๋Š” NVIDIA์—์„œ ๊ฐœ๋ฐœํ•œ ์˜คํ”ˆ ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ์†”๋ฃจ์…˜์ž…๋‹ˆ๋‹ค. Triton ์€ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ๋Œ€๊ทœ๋ชจ๋กœ AI ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ์ž‘์—…์„ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. Ultralytics YOLOv8 ๊ณผ Triton ์ถ”๋ก  ์„œ๋ฒ„๋ฅผ ํ†ตํ•ฉํ•˜๋ฉด ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๊ณ ์„ฑ๋Šฅ ๋”ฅ ๋Ÿฌ๋‹ ์ถ”๋ก  ์›Œํฌ๋กœ๋“œ๋ฅผ ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ๋Š” ํ†ตํ•ฉ์„ ์„ค์ •ํ•˜๊ณ  ํ…Œ์ŠคํŠธํ•˜๋Š” ๋‹จ๊ณ„๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.



Watch: NVIDIA ์‹œ์ž‘ํ•˜๊ธฐ Triton ์ถ”๋ก  ์„œ๋ฒ„.

Triton ์ถ”๋ก  ์„œ๋ฒ„๋ž€ ๋ฌด์—‡์ธ๊ฐ€์š”?

Triton ์ถ”๋ก  ์„œ๋ฒ„๋Š” ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ๋‹ค์–‘ํ•œ AI ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. TensorFlow , PyTorch, ONNX ๋Ÿฐํƒ€์ž„ ๋“ฑ์„ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ๋ฐ ๋จธ์‹  ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ฃผ์š” ์‚ฌ์šฉ ์‚ฌ๋ก€๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ๋‹จ์ผ ์„œ๋ฒ„ ์ธ์Šคํ„ด์Šค์—์„œ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ์„œ๋น„์Šคํ•ฉ๋‹ˆ๋‹ค.
  • ์„œ๋ฒ„ ์žฌ์‹œ์ž‘ ์—†์ด ๋™์  ๋ชจ๋ธ ๋กœ๋”ฉ ๋ฐ ์–ธ๋กœ๋”ฉ.
  • ์•™์ƒ๋ธ” ์ถ”๋ก ์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • A/B ํ…Œ์ŠคํŠธ ๋ฐ ๋กค๋ง ์—…๋ฐ์ดํŠธ๋ฅผ ์œ„ํ•œ ๋ชจ๋ธ ๋ฒ„์ „ ๊ด€๋ฆฌ.

์ „์ œ ์กฐ๊ฑด

๊ณ„์† ์ง„ํ–‰ํ•˜๊ธฐ ์ „์— ๋‹ค์Œ ์‚ฌ์ „ ์š”๊ตฌ ์‚ฌํ•ญ์ด ์ถฉ์กฑ๋˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

  • ๋จธ์‹ ์— Docker๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์„ค์น˜ tritonclient:
    pip install tritonclient[all]
    

YOLOv8 ์—์„œ ONNX ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ

๋ชจ๋ธ์„ Triton ์— ๋ฐฐํฌํ•˜๊ธฐ ์ „์— ONNX ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ONNX ํ˜•์‹์€ ์„œ๋กœ ๋‹ค๋ฅธ ๋”ฅ ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐ„์— ๋ชจ๋ธ์„ ์ „์†กํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•์‹(Open Neural Network Exchange)์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๊ธฐ ์ „์— export ํ•จ์ˆ˜์—์„œ YOLO ํด๋ž˜์Šค:

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt')  # load an official model

# Export the model
onnx_file = model.export(format='onnx', dynamic=True)

Triton ๋ชจ๋ธ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ ์„ค์ •

Triton ๋ชจ๋ธ ์ €์žฅ์†Œ๋Š” Triton ์—์„œ ๋ชจ๋ธ์— ์•ก์„ธ์Šคํ•˜๊ณ  ๋กœ๋“œํ•  ์ˆ˜ ์žˆ๋Š” ์ €์žฅ ์œ„์น˜์ž…๋‹ˆ๋‹ค.

  1. ํ•„์š”ํ•œ ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:

    from pathlib import Path
    
    # Define paths
    triton_repo_path = Path('tmp') / 'triton_repo'
    triton_model_path = triton_repo_path / 'yolo'
    
    # Create directories
    (triton_model_path / '1').mkdir(parents=True, exist_ok=True)
    
  2. ๋‚ด๋ณด๋‚ธ ONNX ๋ชจ๋ธ์„ Triton ๋ฆฌํฌ์ง€ํ† ๋ฆฌ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค:

    from pathlib import Path
    
    # Move ONNX model to Triton Model path
    Path(onnx_file).rename(triton_model_path / '1' / 'model.onnx')
    
    # Create config file
    (triton_model_path / 'config.pbtxt').touch()
    

Triton ์ถ”๋ก  ์„œ๋ฒ„ ์‹คํ–‰

Docker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Triton ์ถ”๋ก  ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

import subprocess
import time

from tritonclient.http import InferenceServerClient

# Define image https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
tag = 'nvcr.io/nvidia/tritonserver:23.09-py3'  # 6.4 GB

# Pull the image
subprocess.call(f'docker pull {tag}', shell=True)

# Run the Triton server and capture the container ID
container_id = subprocess.check_output(
    f'docker run -d --rm -v {triton_repo_path}:/models -p 8000:8000 {tag} tritonserver --model-repository=/models',
    shell=True).decode('utf-8').strip()

# Wait for the Triton server to start
triton_client = InferenceServerClient(url='localhost:8000', verbose=False, ssl=False)

# Wait until model is ready
for _ in range(10):
    with contextlib.suppress(Exception):
        assert triton_client.is_model_ready(model_name)
        break
    time.sleep(1)

๊ทธ๋Ÿฐ ๋‹ค์Œ Triton ์„œ๋ฒ„ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

from ultralytics import YOLO

# Load the Triton Server model
model = YOLO(f'http://localhost:8000/yolo', task='detect')

# Run inference on the server
results = model('path/to/image.jpg')

์ปจํ…Œ์ด๋„ˆ๋ฅผ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค:

# Kill and remove the container at the end of the test
subprocess.call(f'docker kill {container_id}', shell=True)

์œ„์˜ ๋‹จ๊ณ„์— ๋”ฐ๋ผ Triton ์ถ”๋ก  ์„œ๋ฒ„์—์„œ Ultralytics YOLOv8 ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ๋ฐฐํฌํ•˜๊ณ  ์‹คํ–‰ํ•˜์—ฌ ๋”ฅ ๋Ÿฌ๋‹ ์ถ”๋ก  ์ž‘์—…์„ ์œ„ํ•œ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๊ณ ์„ฑ๋Šฅ ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ์ถ”๊ฐ€ ์งˆ๋ฌธ์ด ์žˆ๋Š” ๊ฒฝ์šฐ ๊ณต์‹ Triton ์„ค๋ช…์„œ๋ฅผ ์ฐธ์กฐํ•˜๊ฑฐ๋‚˜ Ultralytics ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋ฌธ์˜ํ•˜์—ฌ ์ง€์›์„ ๋ฐ›์œผ์„ธ์š”.



์ƒ์„ฑ 2023-11-12, ์—…๋ฐ์ดํŠธ 2024-02-03
์ž‘์„ฑ์ž: glenn-jocher (5)

๋Œ“๊ธ€