Meet YOLO26: next-gen vision AI.

Link to this sectionDedicated Endpoints#

Ultralytics Platform enables deployment of YOLO models to dedicated endpoints in 43 global regions. Each endpoint is a single-tenant service with scale-to-zero behavior, a unique endpoint URL, and independent monitoring.

Ultralytics Platform Model Deploy Tab With Region Map And Table

Link to this sectionCreate Endpoint#

Link to this sectionFrom the Deploy Tab#

Deploy a model from its Deploy tab:

  1. Navigate to your model
  2. Click the Deploy tab
  3. Select a region from the interactive world map — regions are color-coded by latency from your location on a green-to-red gradient (faster regions are greener, slower regions are redder)
  4. Click Deploy on the region row

The deployment name is auto-generated from the model name and region city (e.g., yolo26n-iowa).

Link to this sectionFrom the Deployments Page#

Create a deployment from the global Deploy page in the sidebar:

  1. Click New Deployment
  2. Select a model from the model selector
  3. Select a region from the map or table
  4. Review the auto-generated deployment name (editable) and the default resources
  5. Click Deploy Model

Ultralytics Platform New Deployment Dialog With Model Selector And Region Map

Link to this sectionDeployment Lifecycle#

stateDiagram-v2
    [*] --> Creating: Deploy
    Creating --> Deploying: Container starting
    Deploying --> Ready: Health check passed
    Ready --> Stopping: Stop
    Stopping --> Stopped: Stopped
    Stopped --> Ready: Start
    Ready --> [*]: Delete
    Stopped --> [*]: Delete
    Creating --> Failed: Error
    Deploying --> Failed: Error
    Failed --> [*]: Delete

Link to this sectionRegion Selection#

Choose from 43 regions worldwide. The interactive region map and table show:

  • Region pins: Color-coded by latency on a green-to-red gradient (faster regions are greener, slower regions are redder)
  • Deployed regions: Highlighted with a "Deployed" badge
  • Deploying regions: Animated pulse indicator
  • Bidirectional highlighting: Hover on the map highlights the table row, and vice versa

Ultralytics Platform Deploy Tab Region Latency Table Sorted By Latency

The region table on the model Deploy tab includes:

ColumnDescription
LocationCity and country with flag icon
ZoneRegion identifier
LatencyMeasured ping time (median of 3 pings)
DistanceDistance from your location in km
ActionsDeploy button or "Deployed" status badge
New Deployment Dialog

The New Deployment dialog (from the global Deploy page) shows a simpler region table with only Location, Latency, and Select columns.

Choose Wisely

Select the region closest to your users for lowest latency. Use the Rescan button to re-measure latency from your current location.

Link to this sectionAvailable Regions#

ZoneLocation
us-central1Iowa, USA
us-east1South Carolina, USA
us-east4Northern Virginia, USA
us-east5Columbus, USA
us-south1Dallas, USA
us-west1Oregon, USA
us-west2Los Angeles, USA
us-west3Salt Lake City, USA
us-west4Las Vegas, USA
northamerica-northeast1Montreal, Canada
northamerica-northeast2Toronto, Canada
northamerica-south1Queretaro, Mexico
southamerica-east1Sao Paulo, Brazil
southamerica-west1Santiago, Chile

Link to this sectionEndpoint Configuration#

Link to this sectionNew Deployment Dialog#

The New Deployment dialog provides:

SettingDescriptionDefault
ModelSelect from completed models-
RegionDeployment region-
Deployment NameAuto-generated, editable-
CPU CoresFixed default1
Memory (GB)Fixed default2

Ultralytics Platform New Deployment Dialog Resources Panel Expanded

Deployments use fixed defaults of 1 CPU, 2 GiB memory, minInstances = 0, and maxInstances = 1. They scale to zero when idle, so you only pay for active inference time.

Auto-Generated Names

The deployment name is automatically generated from the model name and region city (e.g., yolo26n-iowa). If you deploy the same model to the same region again, a numeric suffix is added (e.g., yolo26n-iowa-2).

Link to this sectionDeploy Tab (Quick Deploy)#

When deploying from the model's Deploy tab, endpoints are created with default resources (1 CPU, 2 GB memory) with scale-to-zero enabled. The deployment name is auto-generated.

Link to this sectionManage Endpoints#

Link to this sectionView Modes#

The deployments list supports three view modes:

ModeDescription
CardsFull detail cards with logs, code examples, predict panel
CompactGrid of smaller cards with key metrics
TableDataTable with sortable columns and search

Ultralytics Platform Deploy Tab Active Deployments Cards View

Link to this sectionDeployment Card (Cards View)#

Each deployment card in the cards view shows:

  • Header: Name, region flag, status badge, start/stop/delete buttons
  • Endpoint URL: Copyable URL with link to API docs
  • Metrics: Request count (24h), P95 latency, error rate
  • Health check: Live health indicator with latency and manual refresh
  • Tabs: Logs, Code, and Predict

The Logs tab shows recent log entries with severity filtering (All / Errors). The Code tab shows ready-to-use code examples in Python, JavaScript, and cURL with your actual endpoint URL and API key. The Predict tab provides an inline predict panel for testing directly on the deployment.

Link to this sectionDeployment Statuses#

StatusDescription
CreatingDeployment is being set up
DeployingContainer is starting
ReadyEndpoint is live and accepting requests
StoppingEndpoint is shutting down
StoppedEndpoint is paused (no billing)
FailedDeployment failed (see error message)

Link to this sectionEndpoint URL#

Each endpoint has a unique URL, for example:

https://predict-abc123.run.app

Ultralytics Platform Deployment Card Endpoint Url With Copy Button

Click the copy button to copy the URL. Click the docs icon to view the auto-generated API documentation for the endpoint.

Link to this sectionLifecycle Management#

Control your endpoint state:

graph LR
    R[Ready] -->|Stop| S[Stopped]
    S -->|Start| R
    R -->|Delete| D[Deleted]
    S -->|Delete| D

    style R fill:#4CAF50,color:#fff
    style S fill:#9E9E9E,color:#fff
    style D fill:#F44336,color:#fff
ActionDescription
StartResume a stopped endpoint
StopPause the endpoint (no billing)
DeletePermanently remove endpoint

Link to this sectionStop Endpoint#

Stop an endpoint to pause billing:

  1. Click the pause icon on the deployment card
  2. Endpoint status changes to "Stopping" then "Stopped"

Stopped endpoints:

  • Don't accept requests
  • Don't incur charges
  • Can be restarted anytime

Link to this sectionDelete Endpoint#

Permanently remove an endpoint:

  1. Click the delete (trash) icon on the deployment card
  2. Confirm deletion in the dialog
Permanent Action

Deletion is immediate and permanent. You can always create a new endpoint.

Link to this sectionUsing Endpoints#

Link to this sectionAuthentication#

Each deployment is created with an API key from your account. Include it in requests:

Authorization: Bearer YOUR_API_KEY

The API key prefix is displayed on the deployment card footer for identification. Generate keys from API Keys.

Link to this sectionNo Rate Limits#

Requests sent directly to your dedicated endpoint's URL are not subject to the Platform API rate limits — throughput is limited only by your endpoint's CPU, memory, and scaling configuration. (Requests proxied through the Platform API, such as the in-browser tester, still use the standard 20 requests/min predict limit.) This is a key advantage over shared inference, which is rate-limited to 20 requests/min per API key.

Link to this sectionRequest Example#

import requests

# Deployment endpoint
url = "https://predict-abc123.run.app/predict"

# Headers with your deployment API key
headers = {"Authorization": "Bearer YOUR_API_KEY"}

# Inference parameters
data = {"conf": 0.25, "iou": 0.7, "imgsz": 640}

# Send image for inference
with open("image.jpg", "rb") as f:
    response = requests.post(url, headers=headers, data=data, files={"file": f})

print(response.json())

Link to this sectionRequest Parameters#

ParameterTypeDefaultRangeDescription
filefile--Image or video file (required)
conffloat0.250.01 – 1.0Minimum confidence threshold
ioufloat0.70.0 – 0.95NMS IoU threshold
imgszint64032 – 1280Input image size in pixels
normalizeboolfalse-Return bounding box coordinates as 0 – 1
decimalsint50 – 10Decimal precision for coordinate values
sourcestring--Image URL or base64 string (alternative to file)
Video Inference

Dedicated endpoints accept both images and videos via the file parameter.

  • Image formats (up to 100 MB): AVIF, BMP, DNG, HEIC, JP2, JPEG, JPG, MPO, PNG, TIF, TIFF, WEBP
  • Video formats (up to 100 MB): ASF, AVI, GIF, M4V, MKV, MOV, MP4, MPEG, MPG, TS, WEBM, WMV

Each video frame is processed individually and results are returned per frame. You can also pass a public image URL or a base64-encoded image via the source parameter instead of file.

Link to this sectionResponse Format#

Same as shared inference with task-specific fields.

Link to this sectionPricing#

Basic dedicated endpoints are free on all plans. Higher-resource configurations (more vCPUs, more memory, warm start) will offer usage-based pricing in the future.

Cost Optimization
  • Use scale-to-zero (default) so endpoints only run when receiving requests
  • Set appropriate max instances for your traffic
  • Monitor usage in the Monitoring dashboard

Link to this sectionFAQ#

Link to this sectionHow many endpoints can I create?#

Endpoint limits depend on plan:

  • Free: Up to 3 deployments
  • Pro: Up to 10 deployments
  • Enterprise: Unlimited deployments

Each model can still be deployed to multiple regions within your plan quota.

Link to this sectionCan I change the region after deployment?#

No, regions are fixed. To change regions:

  1. Delete the existing endpoint
  2. Create a new endpoint in the desired region

Link to this sectionHow do I handle multi-region deployment?#

For global coverage:

  1. Deploy to multiple regions
  2. Use a load balancer or DNS routing
  3. Route users to the nearest endpoint

Link to this sectionWhat's the cold start time?#

Cold start time depends on model size and whether the container is already cached in the region. Typical ranges:

ScenarioCold Start
Cached container~5-15 seconds
First deploy/region~15-45 seconds

The health check uses a 55-second timeout to accommodate worst-case cold starts.

Link to this sectionCan I use custom domains?#

Custom domains are coming soon. Currently, endpoints use platform-generated URLs.

Comments