Meet YOLO26: next-gen vision AI.

Link to this sectionInference#

Ultralytics Platform provides an inference API for testing trained models. Use the browser-based Predict tab for quick validation or the REST API for programmatic access.

Ultralytics Platform Model Predict Tab With Detections Overlay

Link to this sectionPredict Tab#

Every model includes a Predict tab for browser-based inference:

  1. Navigate to your model
  2. Click the Predict tab
  3. Upload an image, use an example, or open your webcam
  4. View predictions instantly with bounding box overlays

Ultralytics Platform Predict Tab Image Upload Dropzone

Link to this sectionInput Methods#

The predict panel supports multiple input methods:

MethodDescription
Image uploadDrag and drop or click to upload an image
Example imagesClick built-in examples (dataset images or defaults)
Webcam captureLive camera feed with single-frame capture
graph LR
    A[Upload Image] --> D[Auto-Inference]
    B[Example Image] --> D
    C[Webcam Capture] --> D
    D --> E[Results + Overlays]

    style D fill:#2196F3,color:#fff
    style E fill:#4CAF50,color:#fff

Link to this sectionUpload Image#

Drag and drop or click to upload:

  • Supported formats: JPEG, PNG, WebP, AVIF, HEIC, JP2, TIFF, BMP, DNG, MPO
  • Max size: 10MB
  • Auto-inference: Results appear automatically after upload
Auto-Inference

The predict panel runs inference automatically when you upload an image, select an example, or capture a webcam frame. No button click is needed.

Link to this sectionExample Images#

The predict panel shows example images from your model's linked dataset. If no dataset is linked, default examples are used:

ImageContent
bus.jpgStreet scene with vehicles
zidane.jpgSports scene with people

For OBB models, aerial images of boats and airports are shown instead.

Preloaded Images

Example images are preloaded when the page loads, so clicking an example triggers near-instant inference with no download wait.

Link to this sectionWebcam#

Click the webcam card to start a live camera feed:

  1. Grant camera permission when prompted
  2. Click the video preview to capture a frame
  3. Inference runs automatically on the captured frame
  4. Click again to restart the webcam

Link to this sectionView Results#

Inference results display:

  • Bounding boxes with class labels as SVG overlays
  • Confidence scores for each detection
  • Class colors from your dataset's color palette (or the Ultralytics default palette)
  • Speed breakdown: Preprocess, inference, postprocess, and network time

Ultralytics Platform Predict Tab Results With Detections And Speed Stats

The results panel shows:

FieldDescription
Detections listEach detection with class name and confidence
Speed statsPreprocess, inference, postprocess, network (ms)
JSON responseRaw API response in a code block

Link to this sectionInference Parameters#

Adjust detection behavior with parameters in the collapsible Parameters section:

Ultralytics Platform Predict Tab Parameters Sliders

ParameterRangeDefaultDescription
Confidence0.01 – 1.00.25Minimum confidence threshold
IoU0.0 – 0.950.7NMS IoU threshold
Image Size320, 640, 1280 (UI toggle)640Input resize dimension (API accepts any value 32 – 1280)
Auto-Rerun

Changing any parameter automatically re-runs inference on the current image with a 500ms debounce. No need to re-upload.

Link to this sectionConfidence Threshold#

Filter predictions by confidence:

  • Higher (0.5+): Fewer, more certain predictions
  • Lower (0.1-0.25): More predictions, some noise
  • Default (0.25): Balanced for most use cases

Link to this sectionIoU Threshold#

Control Non-Maximum Suppression:

  • Higher (0.7+): Allow more overlapping boxes
  • Lower (0.3-0.5): Merge nearby detections more aggressively
  • Default (0.7): Balanced NMS behavior for most use cases

Link to this sectionDeployment Predict#

Each running dedicated endpoint includes a Predict tab directly on its deployment card. This uses the deployment's own inference service rather than the shared predict service, letting you test your deployed endpoint from the browser.

Link to this sectionREST API#

Access inference programmatically:

Link to this sectionAuthentication#

Include your API key in requests:

Authorization: Bearer YOUR_API_KEY
API Key Required

To run inference from your own scripts, notebooks, or apps, include an API key. Generate one in Settings > API Keys.

Link to this sectionEndpoint#

POST https://platform.ultralytics.com/api/models/{modelId}/predict

Link to this sectionRequest#

import requests

url = "https://platform.ultralytics.com/api/models/MODEL_ID/predict"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
files = {"file": open("image.jpg", "rb")}
data = {"conf": 0.25, "iou": 0.7, "imgsz": 640}

response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())

Ultralytics Platform Predict Tab Code Examples Python Tab

Link to this sectionRequest Parameters#

ParameterTypeDefaultRangeDescription
filefile--Image or video file (required unless source set)
conffloat0.250.01 – 1.0Minimum confidence threshold
ioufloat0.70.0 – 0.95NMS IoU threshold
imgszint64032 – 1280Input image size in pixels
normalizeboolfalse-Return bounding box coordinates as 0 – 1
decimalsint50 – 10Decimal precision for coordinate values
sourcestring--Image URL or base64 string (alternative to file)

Link to this sectionResponse#

{
    "images": [
        {
            "shape": [1080, 1920],
            "results": [
                {
                    "class": 0,
                    "name": "person",
                    "confidence": 0.92,
                    "box": { "x1": 100, "y1": 50, "x2": 300, "y2": 400 }
                },
                {
                    "class": 2,
                    "name": "car",
                    "confidence": 0.87,
                    "box": { "x1": 400, "y1": 200, "x2": 600, "y2": 350 }
                }
            ],
            "speed": {
                "preprocess": 1.2,
                "inference": 12.5,
                "postprocess": 2.3
            }
        }
    ],
    "metadata": {
        "imageCount": 1,
        "functionTimeCall": 0.018,
        "model": "model.pt",
        "version": {
            "ultralytics": "8.x.x",
            "torch": "2.6.0",
            "torchvision": "0.21.0",
            "python": "3.13.0"
        }
    }
}

Ultralytics Platform Predict Tab Json Response View

Link to this sectionResponse Fields#

FieldTypeDescription
imagesarrayList of processed images
images[].shapearrayImage dimensions [height, width]
images[].resultsarrayList of detections
images[].results[].classintClass index (integer ID)
images[].results[].namestringClass name
images[].results[].confidencefloatDetection confidence (0-1)
images[].results[].boxobjectBounding box coordinates
images[].speedobjectProcessing times in milliseconds
metadataobjectRequest metadata and version info

Link to this sectionTask-Specific Responses#

Response format varies by task:

{
  "class": 0,
  "name": "person",
  "confidence": 0.92,
  "box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400}
}

Link to this sectionBilling#

Shared inference (the Predict tab and /api/models/{id}/predict endpoint) is included at no additional cost on all plans. There are no per-request charges for shared inference.

For production workloads requiring higher throughput, deploy a dedicated endpoint.

Link to this sectionRate Limits#

Shared inference is rate-limited to 20 requests/min per API key. When throttled, the API returns 429 with a Retry-After header. See the full rate limit reference for all endpoint categories.

Need More Throughput?

Deploy a dedicated endpoint for unlimited inference with no rate limits, predictable throughput, and consistent low-latency responses. For local inference, see the Predict mode guide.

Link to this sectionError Handling#

Common error responses:

CodeMessageSolution
400Invalid imageCheck file format
401UnauthorizedVerify API key
404Model not foundCheck model ID
429Rate limitedWait and retry, or use a dedicated endpoint for unlimited throughput
500Server errorRetry request
503Service unavailablePredict service starting up or unreachable; wait briefly and retry

Link to this sectionFAQ#

Link to this sectionCan I run inference on video?#

Both inference methods accept video files:

  • Dedicated endpoints accept video files directly. Supported formats (up to 100 MB): ASF, AVI, GIF, M4V, MKV, MOV, MP4, MPEG, MPG, TS, WEBM, WMV. Each frame is processed individually and results are returned per frame. See dedicated endpoints for details.
  • Shared inference (/api/models/{id}/predict) uses the same predict service and accepts the same video formats. However, the browser Predict tab in the UI only uploads images — use the REST API directly or a dedicated endpoint for video workflows. The shared endpoint is also rate-limited to 20 req/min, so dedicated endpoints are the better choice for heavy video workloads.

Link to this sectionHow do I get the annotated image?#

The API returns JSON predictions. To visualize:

  1. Use predictions to draw boxes locally
  2. Use Ultralytics plot() method:
from ultralytics import YOLO

model = YOLO("yolo26n.pt")
results = model("image.jpg")
results[0].save("annotated.jpg")

See the Predict mode documentation for the full results API and visualization options.

Link to this sectionWhat's the maximum image size?#

  • Upload limit: 10MB
  • Recommended: <5MB for fast inference
  • Auto-resize: Images are resized to the selected Image Size parameter

Large images are automatically resized while preserving aspect ratio.

Link to this sectionCan I run batch inference?#

The current API processes one image per request. For batch:

  1. Send concurrent requests
  2. Use a dedicated endpoint for higher throughput
  3. Consider local inference for large batches
Batch Inference with Python
import concurrent.futures

import requests

url = "https://predict-abc123.run.app/predict"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
images = ["img1.jpg", "img2.jpg", "img3.jpg"]

def predict(image_path):
    with open(image_path, "rb") as f:
        return requests.post(url, headers=headers, files={"file": f}).json()

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(predict, images))

Comments