Skip to content

Dedicated Endpoints

Ultralytics Platform enables deployment of YOLO models to dedicated endpoints in 43 global regions. Each endpoint is a single-tenant service with auto-scaling, custom URLs, and independent monitoring.

Create Endpoint

Deploy a model to a dedicated endpoint:

  1. Navigate to your model
  2. Click the Deploy tab
  3. Select a region from the map
  4. Click Deploy

Region Selection

Choose from 43 regions worldwide:

The interactive map shows:

  • Region pins: Click to select
  • Latency indicators: Color-coded by distance
    • Green: <100ms
    • Yellow: 100-200ms
    • Red: >200ms

Region Table

View all regions with details:

ColumnDescription
RegionRegion identifier
LocationCity/country
LatencyMeasured ping time
StatusAvailable/deployed

Choose Wisely

Select the region closest to your users for lowest latency. Consider deploying to multiple regions for global coverage.

Available Regions

Americas (15 regions)

RegionLocation
us-central1Iowa, USA
us-east1South Carolina, USA
us-east4Virginia, USA
us-east5Columbus, USA
us-south1Dallas, USA
us-west1Oregon, USA
us-west2Los Angeles, USA
us-west3Salt Lake City, USA
us-west4Las Vegas, USA
northamerica-northeast1Montreal, Canada
northamerica-northeast2Toronto, Canada
southamerica-east1São Paulo, Brazil
southamerica-west1Santiago, Chile

Europe (12 regions)

RegionLocation
europe-central2Warsaw, Poland
europe-north1Finland
europe-southwest1Madrid, Spain
europe-west1Belgium
europe-west2London, UK
europe-west3Frankfurt, Germany
europe-west4Netherlands
europe-west6Zurich, Switzerland
europe-west8Milan, Italy
europe-west9Paris, France
europe-west10Berlin, Germany
europe-west12Turin, Italy

Asia Pacific (16 regions)

RegionLocation
asia-east1Taiwan
asia-east2Hong Kong
asia-northeast1Tokyo, Japan
asia-northeast2Osaka, Japan
asia-northeast3Seoul, Korea
asia-south1Mumbai, India
asia-south2Delhi, India
asia-southeast1Singapore
asia-southeast2Jakarta, Indonesia
australia-southeast1Sydney, Australia
australia-southeast2Melbourne, Australia
me-central1Doha, Qatar
me-central2Dammam, Saudi Arabia
me-west1Tel Aviv, Israel

Endpoint Configuration

When creating an endpoint:

SettingDescriptionDefault
RegionDeployment region-
Min InstancesMinimum running instances0
Max InstancesMaximum scaling limit10

Scaling Options

SettingBehavior
Min = 0Scale to zero when idle (cost-effective)
Min > 0Always-on for no cold starts
MaxUpper limit for traffic spikes

Cold Starts

With min instances = 0, the first request after idle triggers a cold start (2-5 seconds). Set min > 0 for latency-sensitive applications.

Manage Endpoints

View and manage your endpoints:

Endpoint Details

FieldDescription
URLHTTPS endpoint for requests
RegionDeployed region
StatusRunning, Stopped, Deploying
InstancesCurrent/max instance count

Endpoint URL

Each endpoint has a unique URL:

https://model-abc123-us-central1.a.run.app

Click the copy button to copy the URL.

Lifecycle Management

Control your endpoint state:

ActionDescription
StartResume a stopped endpoint
StopPause the endpoint (no billing)
DeletePermanently remove endpoint

Stop Endpoint

Stop an endpoint to pause billing:

  1. Open endpoint actions menu
  2. Click Stop
  3. Confirm action

Stopped endpoints:

  • Don't accept requests
  • Don't incur charges
  • Can be restarted anytime

Delete Endpoint

Permanently remove an endpoint:

  1. Open endpoint actions menu
  2. Click Delete
  3. Confirm deletion

Permanent Action

Deletion is immediate and permanent. You can always create a new endpoint.

Using Endpoints

Authentication

Include your API key in requests:

Authorization: Bearer YOUR_API_KEY

Request Example

curl -X POST \
  "https://model-abc123-us-central1.a.run.app/predict" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@image.jpg"
import requests

url = "https://model-abc123-us-central1.a.run.app/predict"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
files = {"file": open("image.jpg", "rb")}

response = requests.post(url, headers=headers, files=files)
print(response.json())

Response Format

Same as shared inference with task-specific fields.

Pricing

Dedicated endpoints bill based on:

ComponentRate
CPUPer vCPU-second
MemoryPer GB-second
RequestsPer million requests

Cost Optimization

  • Use scale-to-zero for development endpoints
  • Set appropriate max instances
  • Monitor usage in the Monitoring dashboard

FAQ

How many endpoints can I create?

There's no hard limit. Each model can have endpoints in multiple regions. Total endpoints depend on your plan.

Can I change the region after deployment?

No, regions are fixed. To change regions:

  1. Delete the existing endpoint
  2. Create a new endpoint in the desired region

How do I handle multi-region deployment?

For global coverage:

  1. Deploy to multiple regions
  2. Use a load balancer or DNS routing
  3. Route users to the nearest endpoint

What's the cold start time?

Cold start varies by model size:

ModelCold Start
YOLO11n~2 seconds
YOLO11m~3 seconds
YOLO11x~5 seconds

Set min instances > 0 to eliminate cold starts.

Can I use custom domains?

Custom domains are coming soon. Currently, endpoints use platform-generated URLs.



📅 Created 0 days ago ✏️ Updated 0 days ago
glenn-jocher

Comments