Link to this sectionTriển khai YOLOv5 với DeepSparse của Neural Magic#

Chào mừng bạn đến với kỷ nguyên AI được phân phối bằng phần mềm.

Hướng dẫn này giải thích cách triển khai YOLOv5 bằng DeepSparse của Neural Magic.

DeepSparse là một runtime suy luận (inference runtime) với hiệu suất vượt trội trên CPU. Ví dụ, so với ONNX Runtime cơ bản, DeepSparse mang lại tốc độ nhanh gấp 5,8 lần cho YOLOv5s khi chạy trên cùng một máy!

YOLOv5 DeepSparse vs ONNX Runtime speed comparison chart

Lần đầu tiên, các khối lượng công việc deep learning của bạn có thể đáp ứng các yêu cầu hiệu suất của môi trường production mà không gặp phải sự phức tạp và chi phí của các bộ tăng tốc phần cứng. Nói một cách đơn giản, DeepSparse mang lại cho bạn hiệu suất của GPU cùng sự đơn giản của phần mềm:

Triển khai linh hoạt: Chạy nhất quán trên cloud, trung tâm dữ liệu và edge với bất kỳ nhà cung cấp phần cứng nào từ Intel đến AMD và ARM.
Khả năng mở rộng vô hạn: Mở rộng theo chiều dọc lên hàng trăm nhân, mở rộng ra ngoài với Kubernetes tiêu chuẩn, hoặc hoàn toàn trừu tượng hóa với Serverless.
Tích hợp dễ dàng: Các API sạch để tích hợp model của bạn vào ứng dụng và giám sát nó trong môi trường production.

Link to this sectionDeepSparse đạt được hiệu suất đẳng cấp GPU bằng cách nào?#

DeepSparse tận dụng độ thưa thớt (sparsity) của model để tăng tốc hiệu suất.

Sparsification thông qua cắt tỉa (pruning) và lượng tử hóa (quantization) là một kỹ thuật được nghiên cứu rộng rãi, cho phép giảm đáng kể kích thước và tài nguyên tính toán cần thiết để thực thi mạng lưới, đồng thời duy trì độ chính xác cao. DeepSparse nhận biết độ thưa thớt, nghĩa là nó bỏ qua các tham số bằng không, giúp giảm khối lượng tính toán trong một lượt truyền tiến (forward pass). Vì phép tính thưa thớt hiện đã bị giới hạn bởi bộ nhớ, DeepSparse thực thi mạng lưới theo chiều sâu, chia nhỏ vấn đề thành các Tensor Columns, các dải tính toán dọc phù hợp với bộ nhớ đệm (cache).

DeepSparse tensor columns for sparse neural network inference

Các mạng lưới thưa thớt với tính toán được nén, được thực thi theo chiều sâu trong cache, cho phép DeepSparse mang lại hiệu suất đẳng cấp GPU trên CPU!

Link to this sectionLàm thế nào để tạo một phiên bản thưa thớt của YOLOv5 đã huấn luyện trên dữ liệu của tôi?#

Kho lưu trữ model mã nguồn mở của Neural Magic, SparseZoo, chứa các checkpoint đã được làm thưa thớt của từng model YOLOv5. Sử dụng SparseML, vốn được tích hợp với Ultralytics, bạn có thể tinh chỉnh (fine-tune) một checkpoint thưa thớt trên dữ liệu của mình bằng một lệnh CLI duy nhất.

Xem tài liệu YOLOv5 của Neural Magic để biết thêm chi tiết.

Link to this sectionSử dụng DeepSparse#

Chúng ta sẽ đi qua một ví dụ về đánh giá benchmark và triển khai một phiên bản thưa thớt của YOLOv5s với DeepSparse.

Link to this sectionCài đặt DeepSparse#

Chạy lệnh sau để cài đặt DeepSparse. Chúng tôi khuyến nghị bạn sử dụng môi trường ảo (virtual environment) với Python.

pip install "deepsparse[server,yolo,onnxruntime]"

Link to this sectionThu thập tệp ONNX#

DeepSparse chấp nhận model ở định dạng ONNX, được truyền dưới dạng:

Một stub SparseZoo xác định tệp ONNX trong SparseZoo
Một đường dẫn cục bộ đến model ONNX trong hệ thống tệp

Các ví dụ dưới đây sử dụng các checkpoint YOLOv5s dày (dense) và pruned-quantized tiêu chuẩn, được xác định bởi các stub SparseZoo sau:

zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none

Link to this sectionTriển khai Model#

DeepSparse cung cấp các API tiện lợi để tích hợp model của bạn vào ứng dụng.

Để thử các ví dụ triển khai bên dưới, hãy tải xuống một hình ảnh mẫu và lưu nó dưới dạng basilica.jpg với lệnh sau:

wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg

Link to this sectionPython API#

Pipelines bao bọc tiền xử lý và hậu xử lý đầu ra xung quanh runtime, cung cấp một giao diện sạch để thêm DeepSparse vào ứng dụng. Tích hợp DeepSparse-Ultralytics bao gồm một Pipeline sẵn có, chấp nhận hình ảnh thô và xuất ra các khung giới hạn (bounding boxes).

Tạo một Pipeline và chạy suy luận:

from deepsparse import Pipeline

# list of images in local filesystem
images = ["basilica.jpg"]

# create Pipeline
model_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none"
yolo_pipeline = Pipeline.create(
    task="yolo",
    model_path=model_stub,
)

# run inference on images, receive bounding boxes + classes
pipeline_outputs = yolo_pipeline(images=images, iou_thres=0.6, conf_thres=0.001)
print(pipeline_outputs)

Nếu bạn đang chạy trên cloud, bạn có thể gặp lỗi OpenCV không tìm thấy libGL.so.1. Bạn có thể cài đặt thư viện còn thiếu:

apt-get install libgl1

Hoặc sử dụng gói Ultralytics headless để tránh hoàn toàn các phụ thuộc GUI:

pip install ultralytics-opencv-headless

Link to this sectionHTTP Server#

DeepSparse Server chạy trên khung web FastAPI phổ biến và máy chủ web Uvicorn. Chỉ với một lệnh CLI duy nhất, bạn có thể dễ dàng thiết lập một điểm cuối dịch vụ model với DeepSparse. Server hỗ trợ bất kỳ Pipeline nào từ DeepSparse, bao gồm object detection với YOLOv5, cho phép bạn gửi hình ảnh thô đến điểm cuối và nhận về các khung giới hạn.

Khởi chạy Server với YOLOv5s pruned-quantized:

deepsparse.server \
  --task yolo \
  --model_path zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none

Một ví dụ về request, sử dụng gói requests của Python:

import json
from contextlib import ExitStack

import requests

# list of images for inference (local files on client side)
path = ["basilica.jpg"]

# send request over HTTP to /predict/from_files endpoint
url = "http://0.0.0.0:5543/predict/from_files"
with ExitStack() as stack:
    files = [("request", stack.enter_context(open(img, "rb"))) for img in path]
    resp = requests.post(url=url, files=files)

# response is returned in JSON
annotations = json.loads(resp.text)  # dictionary of annotation results
bounding_boxes = annotations["boxes"]
labels = annotations["labels"]

Link to this sectionAnnotate CLI#

Bạn cũng có thể sử dụng lệnh annotate để công cụ lưu ảnh đã chú thích vào ổ đĩa. Hãy thử --source 0 để chú thích nguồn cấp dữ liệu webcam trực tiếp của bạn!

deepsparse.object_detection.annotate --model_filepath zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none --source basilica.jpg

Chạy lệnh trên sẽ tạo một thư mục annotation-results và lưu hình ảnh đã chú thích vào bên trong.

YOLOv5 detection results with bounding boxes

Link to this sectionĐánh giá hiệu suất (Benchmarking)#

Chúng tôi sẽ so sánh thông lượng (throughput) của DeepSparse với thông lượng của ONNX Runtime trên YOLOv5s, sử dụng tập lệnh benchmark của DeepSparse.

Các benchmark được chạy trên một instance AWS c6i.8xlarge (16 nhân).

Link to this sectionSo sánh hiệu suất Batch 32#

Link to this sectionONNX Runtime cơ bản#

Tại batch 32, ONNX Runtime đạt 42 hình ảnh/giây với YOLOv5s dense tiêu chuẩn:

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 32 -nstreams 1 -e onnxruntime

# Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
# Batch Size: 32
# Scenario: sync
# Throughput (items/sec): 41.9025

Link to this sectionHiệu suất DeepSparse Dense#

Trong khi DeepSparse mang lại hiệu suất tốt nhất với các model thưa thớt được tối ưu hóa, nó cũng hoạt động tốt với YOLOv5s dense tiêu chuẩn.

Tại batch 32, DeepSparse đạt 70 hình ảnh/giây với YOLOv5s dense tiêu chuẩn, cải thiện hiệu suất 1,7 lần so với ORT!

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 32 -nstreams 1

# Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
# Batch Size: 32
# Scenario: sync
# Throughput (items/sec): 69.5546

Link to this sectionHiệu suất DeepSparse Sparse#

Khi độ thưa thớt được áp dụng cho model, hiệu suất của DeepSparse so với ONNX Runtime thậm chí còn mạnh mẽ hơn.

Tại batch 32, DeepSparse đạt 241 hình ảnh/giây với YOLOv5s pruned-quantized, cải thiện hiệu suất 5,8 lần so với ORT!

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none -s sync -b 32 -nstreams 1

# Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
# Batch Size: 32
# Scenario: sync
# Throughput (items/sec): 241.2452

Link to this sectionSo sánh hiệu suất Batch 1#

DeepSparse cũng có thể đạt được tốc độ nhanh hơn ONNX Runtime cho tình huống batch 1 nhạy cảm với độ trễ (latency).

Link to this sectionONNX Runtime cơ bản#

Tại batch 1, ONNX Runtime đạt 48 hình ảnh/giây với YOLOv5s dense tiêu chuẩn.

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 1 -nstreams 1 -e onnxruntime

# Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
# Batch Size: 1
# Scenario: sync
# Throughput (items/sec): 48.0921

Link to this sectionHiệu suất DeepSparse Sparse#

Tại batch 1, DeepSparse đạt 135 mục/giây với YOLOv5s pruned-quantized, tăng hiệu suất 2,8 lần so với ONNX Runtime!

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none -s sync -b 1 -nstreams 1

# Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
# Batch Size: 1
# Scenario: sync
# Throughput (items/sec): 134.9468

Vì các instance c6i.8xlarge có các tập lệnh VNNI, thông lượng của DeepSparse có thể được đẩy cao hơn nếu các trọng số được cắt tỉa theo khối 4.

Tại batch 1, DeepSparse đạt 180 mục/giây với YOLOv5s 4-block pruned-quantized, tăng hiệu suất 3,7 lần so với ONNX Runtime!

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni -s sync -b 1 -nstreams 1

# Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni
# Batch Size: 1
# Scenario: sync
# Throughput (items/sec): 179.7375

Link to this sectionBắt đầu với DeepSparse#

Nghiên cứu hoặc Kiểm thử? DeepSparse Community miễn phí cho nghiên cứu và kiểm thử. Hãy bắt đầu với Tài liệu của họ.

Để biết thêm thông tin về việc triển khai YOLOv5 với DeepSparse, hãy xem tài liệu DeepSparse của Neural Magic và bài viết trên blog Ultralytics về tích hợp DeepSparse.

Những người đóng góp

GLglenn-jocher³

Đã tạo tháng trướcĐã cập nhật Hôm qua

Link to this sectionTriển khai YOLOv5 với DeepSparse của Neural Magic#

Link to this sectionDeepSparse đạt được hiệu suất đẳng cấp GPU bằng cách nào?#

Link to this sectionLàm thế nào để tạo một phiên bản thưa thớt của YOLOv5 đã huấn luyện trên dữ liệu của tôi?#

Link to this sectionSử dụng DeepSparse#

Link to this sectionCài đặt DeepSparse#

Link to this sectionThu thập tệp ONNX#

Link to this sectionTriển khai Model#

Link to this sectionPython API#

Link to this sectionHTTP Server#

Link to this sectionAnnotate CLI#

Link to this sectionĐánh giá hiệu suất (Benchmarking)#

Link to this sectionSo sánh hiệu suất Batch 32#

Link to this sectionONNX Runtime cơ bản#

Link to this sectionHiệu suất DeepSparse Dense#

Link to this sectionHiệu suất DeepSparse Sparse#

Link to this sectionSo sánh hiệu suất Batch 1#

Link to this sectionONNX Runtime cơ bản#

Link to this sectionHiệu suất DeepSparse Sparse#

Link to this sectionBắt đầu với DeepSparse#

Bình luận