Skip to content

Cloud Training

Ultralytics Platform Cloud Training offers single-click training on cloud GPUs, making model training accessible without complex setup. Train YOLO models with real-time metrics streaming and automatic checkpoint saving.


Watch: Cloud Training with Ultralytics Platform

Train from UI

Start cloud training directly from the Platform:

  1. Navigate to your project
  2. Click Train Model
  3. Configure training parameters
  4. Click Start Training

Step 1: Select Dataset

Choose a dataset from your uploads:

OptionDescription
Your DatasetsDatasets you've uploaded
Public DatasetsShared datasets from Explore

Step 2: Configure Model

Select base model and parameters:

ParameterDescriptionDefault
ModelBase architecture (YOLO11n, s, m, l, x)YOLO11n
EpochsNumber of training iterations100
Image SizeInput resolution640
Batch SizeSamples per iterationAuto

Step 3: Select GPU

Choose your compute resources:

GPUVRAMSpeedCost/Hour
RTX 6000 Pro96GBVery FastFree
M4 Pro (Mac)64GBFastFree
RTX 309024GBGood$0.44
RTX 409024GBFast$0.74
L40S48GBFast$1.14
A100 40GB40GBVery Fast$1.29
A100 80GB80GBVery Fast$1.99
H100 80GB80GBFastest$3.99

GPU Selection

  • RTX 6000 Pro (Free): Excellent for most training jobs on Ultralytics infrastructure
  • M4 Pro (Free): Apple Silicon option for compatible workloads
  • RTX 4090: Best value for paid cloud training
  • A100 80GB: Required for large batch sizes or big models
  • H100: Maximum performance for time-sensitive training

Free Training Tier

The RTX 6000 Pro Ada (96GB VRAM) and M4 Pro GPUs are available at no cost, running on Ultralytics infrastructure. These are ideal for getting started and regular training jobs.

Step 4: Start Training

Click Start Training to launch your job. The Platform:

  1. Provisions a GPU instance
  2. Downloads your dataset
  3. Begins training
  4. Streams metrics in real-time

Free Credits

New accounts receive $5 in credits - enough for several training runs on RTX 4090. Check your balance in Settings > Billing.

Monitor Training

View real-time training progress:

Live Metrics

MetricDescription
LossTraining and validation loss
mAPMean Average Precision
PrecisionCorrect positive predictions
RecallDetected ground truths
GPU UtilGPU utilization percentage
MemoryGPU memory usage

Checkpoints

Checkpoints are saved automatically:

  • Every epoch: Latest weights saved
  • Best model: Highest mAP checkpoint preserved
  • Final model: Weights at training completion

Stop and Resume

Stop Training

Click Stop Training to pause your job:

  • Current checkpoint is saved
  • GPU instance is released
  • Credits stop being charged

Resume Training

Continue from your last checkpoint:

  1. Navigate to the model
  2. Click Resume Training
  3. Confirm continuation

Resume Limitations

You can only resume training that was explicitly stopped. Failed training jobs may need to restart from scratch.

Remote Training

Train on your own hardware while streaming metrics to the Platform.

Package Version Requirement

Platform integration requires ultralytics>=8.4.0. Lower versions will NOT work with Platform.

pip install "ultralytics>=8.4.0"

Setup API Key

  1. Go to Settings > API Keys
  2. Create a new key with training scope
  3. Set the environment variable:
export ULTRALYTICS_API_KEY="your_api_key"

Train with Streaming

Use the project and name parameters to stream metrics:

yolo train model=yolo11n.pt data=coco.yaml epochs=100 \
  project=username/my-project name=experiment-1
from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.train(
    data="coco.yaml",
    epochs=100,
    project="username/my-project",
    name="experiment-1",
)

Using Platform Datasets

Train with datasets stored on the Platform:

yolo train model=yolo11n.pt data=ul://username/datasets/my-dataset epochs=100

The ul:// URI format automatically downloads and configures your dataset.

Billing

Training costs are based on GPU usage:

Cost Calculation

Total Cost = GPU Rate × Training Time (hours)
ExampleGPUTimeCost
Small jobRTX 40901 hour$0.74
Medium jobA100 40GB4 hours$5.16
Large jobH1008 hours$31.92

Payment Methods

MethodDescription
Account BalancePre-loaded credits
Pay Per JobCharge at job completion

Minimum Balance

A minimum balance of $5.00 is required to start epoch-based training.

View Training Costs

After training, view detailed costs in the Billing tab:

  • Per-epoch cost breakdown
  • Total GPU time
  • Download cost report

Training Tips

Choose the Right Model Size

ModelParametersBest For
YOLO11n2.6MReal-time, edge devices
YOLO11s9.4MBalanced speed/accuracy
YOLO11m20.1MHigher accuracy
YOLO11l25.3MProduction accuracy
YOLO11x56.9MMaximum accuracy

Optimize Training Time

  1. Start small: Test with fewer epochs first
  2. Use appropriate GPU: Match GPU to model/batch size
  3. Validate dataset: Ensure quality before training
  4. Monitor early: Stop if metrics plateau

Troubleshooting

IssueSolution
Training stuck at 0%Check dataset format, retry
Out of memoryReduce batch size or use larger GPU
Poor accuracyIncrease epochs, check data quality
Training slowConsider faster GPU

FAQ

How long does training take?

Training time depends on:

  • Dataset size
  • Model size
  • Number of epochs
  • GPU selected

Typical times (1000 images, 100 epochs):

ModelRTX 4090A100
YOLO11n30 min20 min
YOLO11m60 min40 min
YOLO11x120 min80 min

Can I train overnight?

Yes, training continues until completion. You'll receive a notification when training finishes. Make sure your account has sufficient balance for epoch-based training.

What happens if I run out of credits?

Training pauses at the end of the current epoch. Your checkpoint is saved, and you can resume after adding credits.

Can I use custom training arguments?

Yes, advanced users can specify additional arguments in the training configuration.

Training Parameters Reference

Core Parameters

ParameterTypeDefaultRangeDescription
epochsint1001+Number of training epochs
batchint16-1 = autoBatch size (-1 for auto)
imgszint64032+Input image size
patienceint1000+Early stopping patience
workersint80+Dataloader workers
cacheboolFalse-Cache images (ram/disk)

Learning Rate Parameters

ParameterTypeDefaultRangeDescription
lr0float0.010.0-1.0Initial learning rate
lrffloat0.010.0-1.0Final LR factor
momentumfloat0.9370.0-1.0SGD momentum
weight_decayfloat0.00050.0-1.0L2 regularization
warmup_epochsfloat3.00+Warmup epochs
cos_lrboolFalse-Cosine LR scheduler

Augmentation Parameters

ParameterTypeDefaultRangeDescription
hsv_hfloat0.0150.0-1.0HSV hue augmentation
hsv_sfloat0.70.0-1.0HSV saturation
hsv_vfloat0.40.0-1.0HSV value
degreesfloat0.0-Rotation degrees
translatefloat0.10.0-1.0Translation fraction
scalefloat0.50.0-1.0Scale factor
fliplrfloat0.50.0-1.0Horizontal flip prob
flipudfloat0.00.0-1.0Vertical flip prob
mosaicfloat1.00.0-1.0Mosaic augmentation
mixupfloat0.00.0-1.0Mixup augmentation
copy_pastefloat0.00.0-1.0Copy-paste (segment)

Optimizer Selection

ValueDescription
autoAutomatic selection (default)
SGDStochastic Gradient Descent
AdamAdam optimizer
AdamWAdam with weight decay

Task-Specific Parameters

Some parameters only apply to specific tasks:

  • Segment: overlap_mask, mask_ratio, copy_paste
  • Pose: pose (loss weight), kobj (keypoint objectness)
  • Classify: dropout, erasing, auto_augment


📅 Created 0 days ago ✏️ Updated 0 days ago
glenn-jocher

Comments