Skip to content

Inference

Ultralytics Platform provides an inference API for testing trained models. Use the browser-based Predict tab for quick validation or the REST API for programmatic access.

Ultralytics Platform Model Predict Tab With Detections Overlay

Predict Tab

Every model includes a Predict tab for browser-based inference:

  1. Navigate to your model
  2. Click the Predict tab
  3. Upload an image, use an example, or open your webcam
  4. View predictions instantly with bounding box overlays

Ultralytics Platform Predict Tab Image Upload Dropzone

Input Methods

The predict panel supports multiple input methods:

MethodDescription
Image uploadDrag and drop or click to upload an image
Example imagesClick built-in examples (dataset images or defaults)
Webcam captureLive camera feed with single-frame capture
graph LR
    A[Upload Image] --> D[Auto-Inference]
    B[Example Image] --> D
    C[Webcam Capture] --> D
    D --> E[Results + Overlays]

    style D fill:#2196F3,color:#fff
    style E fill:#4CAF50,color:#fff

Upload Image

Drag and drop or click to upload:

  • Supported formats: JPEG, PNG, WebP, AVIF, HEIC, JP2, TIFF, BMP, DNG, MPO
  • Max size: 10MB
  • Auto-inference: Results appear automatically after upload

Auto-Inference

The predict panel runs inference automatically when you upload an image, select an example, or capture a webcam frame. No button click is needed.

Example Images

The predict panel shows example images from your model's linked dataset. If no dataset is linked, default examples are used:

ImageContent
bus.jpgStreet scene with vehicles
zidane.jpgSports scene with people

For OBB models, aerial images of boats and airports are shown instead.

Preloaded Images

Example images are preloaded when the page loads, so clicking an example triggers near-instant inference with no download wait.

Webcam

Click the webcam card to start a live camera feed:

  1. Grant camera permission when prompted
  2. Click the video preview to capture a frame
  3. Inference runs automatically on the captured frame
  4. Click again to restart the webcam

View Results

Inference results display:

  • Bounding boxes with class labels as SVG overlays
  • Confidence scores for each detection
  • Class colors from your dataset's color palette (or the Ultralytics default palette)
  • Speed breakdown: Preprocess, inference, postprocess, and network time

Ultralytics Platform Predict Tab Results With Detections And Speed Stats

The results panel shows:

FieldDescription
Detections listEach detection with class name and confidence
Speed statsPreprocess, inference, postprocess, network (ms)
JSON responseRaw API response in a code block

Inference Parameters

Adjust detection behavior with parameters in the collapsible Parameters section:

Ultralytics Platform Predict Tab Parameters Sliders

ParameterRangeDefaultDescription
Confidence0.01-1.00.25Minimum confidence threshold
IoU0.0-0.950.70NMS IoU threshold
Image Size320, 640, 1280640Input resize dimension (button toggle)

Auto-Rerun

Changing any parameter automatically re-runs inference on the current image with a 500ms debounce. No need to re-upload.

Confidence Threshold

Filter predictions by confidence:

  • Higher (0.5+): Fewer, more certain predictions
  • Lower (0.1-0.25): More predictions, some noise
  • Default (0.25): Balanced for most use cases

IoU Threshold

Control Non-Maximum Suppression:

  • Higher (0.7+): Allow more overlapping boxes
  • Lower (0.3-0.5): Merge nearby detections more aggressively
  • Default (0.70): Balanced NMS behavior for most use cases

Deployment Predict

Each running dedicated endpoint includes a Predict tab directly on its deployment card. This uses the deployment's own inference service rather than the shared predict service, letting you test your deployed endpoint from the browser.

REST API

Access inference programmatically:

Authentication

Include your API key in requests:

Authorization: Bearer YOUR_API_KEY

API Key Required

To run inference from your own scripts, notebooks, or apps, include an API key. Generate one in Settings (API Keys section on the Profile tab).

Endpoint

POST https://platform.ultralytics.com/api/models/{modelId}/predict

Request

import requests

url = "https://platform.ultralytics.com/api/models/MODEL_ID/predict"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
files = {"file": open("image.jpg", "rb")}
data = {"conf": 0.25, "iou": 0.7, "imgsz": 640}

response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())
curl -X POST \
  "https://platform.ultralytics.com/api/models/MODEL_ID/predict" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@image.jpg" \
  -F "conf=0.25" \
  -F "iou=0.7" \
  -F "imgsz=640"
const formData = new FormData();
formData.append("file", fileInput.files[0]);
formData.append("conf", "0.25");
formData.append("iou", "0.7");
formData.append("imgsz", "640");

const response = await fetch(
  "https://platform.ultralytics.com/api/models/MODEL_ID/predict",
  {
    method: "POST",
    headers: { Authorization: "Bearer YOUR_API_KEY" },
    body: formData,
  }
);

const result = await response.json();
console.log(result);

Ultralytics Platform Predict Tab Code Examples Python Tab

Response

{
    "images": [
        {
            "shape": [1080, 1920],
            "results": [
                {
                    "class": 0,
                    "name": "person",
                    "confidence": 0.92,
                    "box": { "x1": 100, "y1": 50, "x2": 300, "y2": 400 }
                },
                {
                    "class": 2,
                    "name": "car",
                    "confidence": 0.87,
                    "box": { "x1": 400, "y1": 200, "x2": 600, "y2": 350 }
                }
            ],
            "speed": {
                "preprocess": 1.2,
                "inference": 12.5,
                "postprocess": 2.3
            }
        }
    ],
    "metadata": {
        "imageCount": 1,
        "functionTimeCall": 0.018,
        "model": "model.pt",
        "version": {
            "ultralytics": "8.4.14",
            "torch": "2.6.0",
            "torchvision": "0.21.0",
            "python": "3.13.0"
        }
    }
}

Ultralytics Platform Predict Tab Json Response View

Response Fields

FieldTypeDescription
imagesarrayList of processed images
images[].shapearrayImage dimensions [height, width]
images[].resultsarrayList of detections
images[].results[].namestringClass name
images[].results[].confidencefloatDetection confidence (0-1)
images[].results[].boxobjectBounding box coordinates
images[].speedobjectProcessing times in milliseconds
metadataobjectRequest metadata and version info

Task-Specific Responses

Response format varies by task:

{
  "class": 0,
  "name": "person",
  "confidence": 0.92,
  "box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400}
}
{
  "class": 0,
  "name": "person",
  "confidence": 0.92,
  "box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400},
  "segments": [[100, 50], [150, 60], ...]
}
{
  "class": 0,
  "name": "person",
  "confidence": 0.92,
  "box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400},
  "keypoints": [
    {"x": 200, "y": 75, "conf": 0.95},
    ...
  ]
}
{
  "results": [
    {"class": 0, "name": "cat", "confidence": 0.95},
    {"class": 1, "name": "dog", "confidence": 0.03}
  ]
}
{
  "class": 0,
  "name": "ship",
  "confidence": 0.89,
  "box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400},
  "obb": {"x1": 105, "y1": 48, "x2": 295, "y2": 55, "x3": 290, "y3": 395, "x4": 110, "y4": 402}
}

Rate Limits

Shared inference is rate-limited to 20 requests/min per API key. When throttled, the API returns 429 with a Retry-After header. See the full rate limit reference for all endpoint categories.

Need More Throughput?

Deploy a dedicated endpoint for unlimited inference with no rate limits, predictable throughput, and consistent low-latency responses. For local inference, see the Predict mode guide.

Error Handling

Common error responses:

CodeMessageSolution
400Invalid imageCheck file format
401UnauthorizedVerify API key
404Model not foundCheck model ID
429Rate limitedWait and retry, or use a dedicated endpoint for unlimited throughput
500Server errorRetry request

FAQ

Can I run inference on video?

The API accepts individual frames. For video:

  1. Extract frames locally
  2. Send each frame to the API
  3. Aggregate results

For real-time video, consider deploying a dedicated endpoint.

How do I get the annotated image?

The API returns JSON predictions. To visualize:

  1. Use predictions to draw boxes locally
  2. Use Ultralytics plot() method:
from ultralytics import YOLO

model = YOLO("yolo26n.pt")
results = model("image.jpg")
results[0].save("annotated.jpg")

See the Predict mode documentation for the full results API and visualization options.

What's the maximum image size?

  • Upload limit: 10MB
  • Recommended: <5MB for fast inference
  • Auto-resize: Images are resized to the selected Image Size parameter

Large images are automatically resized while preserving aspect ratio.

Can I run batch inference?

The current API processes one image per request. For batch:

  1. Send concurrent requests
  2. Use a dedicated endpoint for higher throughput
  3. Consider local inference for large batches

Batch Inference with Python

import concurrent.futures

import requests

url = "https://predict-abc123.run.app/predict"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
images = ["img1.jpg", "img2.jpg", "img3.jpg"]


def predict(image_path):
    with open(image_path, "rb") as f:
        return requests.post(url, headers=headers, files={"file": f}).json()


with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(predict, images))


📅 Created 1 month ago ✏️ Updated 6 days ago
glenn-jochersergiuwaxmannLaughing-q

Comments