Meet YOLO26: next-gen vision AI.

Link to this sectionDeployment#

Ultralytics Platform provides comprehensive deployment options for putting your YOLO models into production. Test models with browser-based inference, deploy to dedicated endpoints across 43 global regions, and monitor performance in real-time.



Watch: Get Started with Ultralytics Platform - Deploy

Link to this sectionOverview#

The Deployment section helps you:

  • Test models directly in the browser with the Predict tab
  • Deploy to dedicated endpoints in 43 global regions
  • Monitor request metrics, logs, and health checks
  • Scale to zero when idle (deployments currently run a single active instance)

Ultralytics Platform Deploy Page World Map With Overview Cards

Link to this sectionDeployment Options#

Ultralytics Platform offers multiple deployment paths:

OptionDescriptionBest For
Predict TabBrowser-based inference with image, webcam, and examplesDevelopment, validation
Shared InferenceMulti-tenant service across 3 regionsLight usage, testing
Dedicated EndpointsSingle-tenant services across 43 regionsProduction, low latency

Link to this sectionWorkflow#

graph LR
    A[✅ Test] --> B[⚙️ Configure]
    B --> C[🌐 Deploy]
    C --> D[📊 Monitor]

    style A fill:#4CAF50,color:#fff
    style B fill:#2196F3,color:#fff
    style C fill:#FF9800,color:#fff
    style D fill:#9C27B0,color:#fff
StageDescription
TestValidate model with the Predict tab
ConfigureSelect region and deployment name (deployments use fixed default resources)
DeployCreate a dedicated endpoint from the Deploy tab
MonitorTrack requests, latency, errors, and logs in Monitoring

Link to this sectionArchitecture#

Link to this sectionShared Inference#

The shared inference service runs in 3 key regions, automatically routing requests based on your data region:

graph TB
    User[User Request] --> API[Platform API]
    API --> Router{Region Router}
    Router -->|US users| US["US Predict Service<br/>Iowa"]
    Router -->|EU users| EU["EU Predict Service<br/>Belgium"]
    Router -->|AP users| AP["AP Predict Service<br/>Taiwan"]

    style User fill:#f5f5f5,color:#333
    style API fill:#2196F3,color:#fff
    style Router fill:#FF9800,color:#fff
    style US fill:#4CAF50,color:#fff
    style EU fill:#4CAF50,color:#fff
    style AP fill:#4CAF50,color:#fff
RegionLocation
USIowa, USA
EUBelgium, Europe
APTaiwan, Asia-Pacific

Link to this sectionDedicated Endpoints#

Deploy to 43 regions worldwide on Ultralytics Cloud:

  • Americas: 14 regions
  • Europe: 13 regions
  • Asia-Pacific: 12 regions
  • Middle East & Africa: 4 regions

Each endpoint is a single-tenant service with:

  • Default resources of 1 CPU, 2 GiB memory, minInstances=0, maxInstances=1
  • Scale-to-zero when idle
  • Unique endpoint URL
  • Independent monitoring, logs, and health checks

Link to this sectionDeployments Page#

Access the global deployments page from the sidebar under Deploy. This page shows:

  • World map with deployed region pins (interactive map)
  • Overview cards: Total Requests (24h), Active Deployments, Error Rate (24h), P95 Latency (24h)
  • Deployments list with three view modes: cards, compact, and table
  • New Deployment button to create endpoints from any completed model

Ultralytics Platform Deploy Page Overview Cards And Deployments List

Automatic Polling

The page polls every 15 seconds normally. When deployments are in a transitional state (creating, deploying, or stopping), polling increases to every 3 seconds for faster feedback.

Link to this sectionKey Features#

Link to this sectionGlobal Coverage#

Deploy close to your users with 43 regions covering:

  • North America, South America
  • Europe, Middle East, Africa
  • Asia Pacific, Oceania

Link to this sectionScaling Behavior#

Endpoints currently behave as follows:

  • Scale to zero: No cost when idle (default)
  • Single active instance: maxInstances is currently capped at 1 on all plans
Cost Savings

Scale-to-zero is enabled by default (min instances = 0). You only pay for active inference time.

Link to this sectionLow Latency#

Dedicated endpoints provide:

  • Cold start: ~5-15 seconds (cached container), up to ~45 seconds (first deploy)
  • Warm inference: 50-200ms (model dependent)
  • Regional routing for optimal performance

Link to this sectionHealth Checks#

Each running deployment includes an automatic health check with:

  • Live status indicator (healthy/unhealthy)
  • Response latency display
  • Auto-retry when unhealthy (polls every 20 seconds)
  • Manual refresh button

Link to this sectionQuick Start#

Deploy a model in under 2 minutes:

  1. Train or upload a model to a project
  2. Go to the model's Deploy tab
  3. Select a region from the latency table
  4. Click Deploy — your endpoint is live
Quick Deploy
Model → Deploy tab → Select region → Click Deploy → Endpoint URL ready

Once deployed, use the endpoint URL with your API key to send inference requests from any application.

Link to this sectionFAQ#

Link to this sectionWhat's the difference between shared and dedicated inference?#

FeatureSharedDedicated
LatencyVariableConsistent
CostFree (included)Free (basic), usage-based (advanced)
ScaleLimitedScale-to-zero, single instance
Regions343
URLGenericCustom
Rate20 req/min20 req/min via Platform; unlimited on direct endpoint URL

Link to this sectionHow long does deployment take?#

Dedicated endpoint deployment typically takes 1-2 minutes:

  1. Image pull (~30s)
  2. Container start (~30s)
  3. Health check (~30s)

Link to this sectionCan I deploy multiple models?#

Yes, each model can have multiple endpoints in different regions. Deployment counts are limited by plan: Free 3, Pro 10, Enterprise unlimited.

Link to this sectionWhat happens when an endpoint is idle?#

With scale-to-zero enabled:

  • Endpoint scales down after inactivity
  • First request triggers cold start
  • Subsequent requests are fast

First requests after an idle period trigger a cold start.

Comments