Skip to content

Cloud Training

Ultralytics Platform Cloud Training offers single-click training on cloud GPUs, making model training accessible without complex setup. Train YOLO models with real-time metrics streaming and automatic checkpoint saving.

graph LR
    A[Configure] --> B[Start Training]
    B --> C[Provision GPU]
    C --> D[Download Dataset]
    D --> E[Train]
    E --> F[Stream Metrics]
    F --> G[Save Checkpoints]
    G --> H[Complete]

    style A fill:#2196F3,color:#fff
    style B fill:#FF9800,color:#fff
    style E fill:#9C27B0,color:#fff
    style H fill:#4CAF50,color:#fff

Training Dialog

Start training from the platform UI by clicking New Model on any project page (or Train from a dataset page). The training dialog has two tabs: Cloud Training and Local Training.

Ultralytics Platform Training Dialog Cloud Tab

Step 1: Select Base Model

Choose from official YOLO26 models or your own trained models:

CategoryDescription
OfficialAll 25 YOLO26 models (5 sizes x 5 tasks)
Your ModelsYour completed models for fine-tuning

Official models are organized by task type (Detect, Segment, Pose, OBB, Classify) with sizes from nano to xlarge.

Step 2: Select Dataset

Choose a dataset to train on (see Datasets):

OptionDescription
OfficialCurated datasets from Ultralytics
Your DatasetsDatasets you've uploaded

Dataset Requirements

Datasets must be in ready status with at least 1 image in the train split, 1 image in the validation or test split, and at least 1 labeled image.

Task Mismatch

A task mismatch warning appears if the model task (e.g., detect) doesn't match the dataset task (e.g., segment). Training will fail if you proceed with mismatched tasks. Ensure both model and dataset use the same task type, as described in the task guides.

Step 3: Configure Parameters

Set core training parameters:

ParameterDescriptionDefault
EpochsNumber of training iterations100
Batch SizeSamples per iteration16
Image SizeInput resolution (320/416/512/640/1280 dropdown, or 32-4096 in YAML editor)640
Run NameOptional name for the training runauto

Step 4: Advanced Settings (Optional)

Expand Advanced Settings to access the full YAML-based parameter editor with 40+ training parameters organized by group (see configuration reference):

GroupParameters
Learning Ratelr0, lrf, momentum, weight_decay, warmup_epochs, warmup_momentum, warmup_bias_lr
OptimizerSGD, MuSGD, Adam, AdamW, NAdam, RAdam, RMSProp, Adamax
Loss Weightsbox, cls, dfl, pose, kobj, label_smoothing
Color Augmentationhsv_h, hsv_s, hsv_v
Geometric Augment.degrees, translate, scale, shear, perspective
Flip & Mix Augment.flipud, fliplr, mosaic, mixup, copy_paste
Training Controlpatience, seed, deterministic, amp, cos_lr, close_mosaic, save_period
Datasetfraction, freeze, single_cls, rect, multi_scale, resume

Parameters are task-aware (e.g., copy_paste only shows for segment tasks, pose/kobj only for pose tasks). A Modified badge appears when values differ from defaults, and you can reset all to defaults with the reset button.

Example: Tuning Augmentation for Small Datasets

For small datasets (<1000 images), increase augmentation to reduce overfitting:

mosaic: 1.0       # Keep mosaic on
mixup: 0.3        # Add mixup blending
copy_paste: 0.3   # Add copy-paste (segment only)
fliplr: 0.5       # Horizontal flip
degrees: 10.0     # Slight rotation
scale: 0.9        # Aggressive scaling

Step 5: Select GPU (Cloud Tab)

Choose your GPU from Ultralytics Cloud:

Ultralytics Platform Training Dialog Gpu Selector And Cost

GPUVRAMCost/Hour
RTX 2000 Ada16 GB$0.24
RTX A450020 GB$0.24
RTX A500024 GB$0.26
RTX 4000 Ada20 GB$0.38
L424 GB$0.39
A4048 GB$0.40
RTX 309024 GB$0.46
RTX A600048 GB$0.49
RTX 409024 GB$0.59
RTX 6000 Ada48 GB$0.77
L40S48 GB$0.86
RTX 509032 GB$0.89
L4048 GB$0.99
A100 PCIe80 GB$1.39
A100 SXM80 GB$1.49
RTX PRO 600096 GB$1.89
H100 PCIe80 GB$2.39
H100 SXM80 GB$2.69
H100 NVL94 GB$3.07
H200 NVL143 GB$3.39
H200 SXM141 GB$3.59
B200180 GB$4.99

GPU Selection

  • RTX PRO 6000: 96 GB Blackwell generation, recommended default for most jobs
  • A100 SXM: Required for large batch sizes or big models
  • H100/H200: Maximum performance for time-sensitive training
  • B200: NVIDIA Blackwell architecture for cutting-edge workloads

The dialog shows your current balance and a Top Up button. An estimated cost and duration are calculated based on your configuration (model size, dataset images, epochs, GPU speed).

Step 6: Start Training

Click Start Training to launch your job. The Platform:

  1. Provisions a GPU instance
  2. Downloads your dataset
  3. Begins training
  4. Streams metrics in real-time

Training Job Lifecycle

Training jobs progress through the following statuses:

StatusDescription
PendingJob submitted, waiting for GPU allocation
StartingGPU provisioned, downloading dataset and model
RunningTraining in progress, metrics streaming in real-time
CompletedTraining finished successfully
FailedTraining failed (see console logs for details)
CancelledTraining was cancelled by the user

Free Credits

New accounts receive signup credits — $5 for personal emails and $25 for company emails. Check your balance in Settings > Billing.

Ultralytics Platform Training Progress With Charts

Monitor Training

View real-time training progress on the model page's Train tab:

Charts Subtab

Ultralytics Platform Model Training Live Charts

MetricDescription
LossTraining and validation loss
mAPMean Average Precision
PrecisionCorrect positive predictions
RecallDetected ground truths

Console Subtab

Live console output with ANSI color support, progress bars, and error detection.

System Subtab

Real-time GPU utilization, memory, temperature, CPU, and disk usage.

Checkpoints

Checkpoints are saved automatically:

  • Every epoch: Latest weights saved
  • Best model: Highest mAP checkpoint preserved
  • Final model: Weights at training completion

Cancel Training

Click Cancel Training on the model page to stop a running job:

  • The compute instance is terminated
  • Credits stop being charged
  • Checkpoints saved up to that point are preserved

Remote Training

graph LR
    A[Local GPU] --> B[Train]
    B --> C[ultralytics Package]
    C --> D[Stream Metrics]
    D --> E[Platform Dashboard]

    style A fill:#FF9800,color:#fff
    style C fill:#2196F3,color:#fff
    style E fill:#4CAF50,color:#fff

Train on your own hardware while streaming metrics to the platform.

Package Version Requirement

Platform integration requires ultralytics>=8.4.14. Lower versions will NOT work with Platform.

pip install -U ultralytics

Setup API Key

  1. Go to Settings > Profile (API Keys section)
  2. Create a new key (or the platform auto-creates one when you open the Local Training tab)
  3. Set the environment variable:
export ULTRALYTICS_API_KEY="your_api_key"

Train with Streaming

Use the project and name parameters to stream metrics:

yolo train model=yolo26n.pt data=coco.yaml epochs=100 \
  project=username/my-project name=experiment-1
from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(
    data="coco.yaml",
    epochs=100,
    project="username/my-project",
    name="experiment-1",
)

The Local Training tab in the training dialog shows a pre-configured command with your API key, selected parameters, and advanced arguments included.

Using Platform Datasets

Train with datasets stored on the platform using the ul:// URI format:

yolo train model=yolo26n.pt data=ul://username/datasets/my-dataset epochs=100 \
  project=username/my-project name=exp1
from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(
    data="ul://username/datasets/my-dataset",
    epochs=100,
    project="username/my-project",
    name="exp1",
)

The ul:// URI format automatically downloads and configures your dataset. The model is automatically linked to the dataset on the platform (see Using Platform Datasets).

Billing

Training costs are based on GPU usage:

Cost Estimation

Before training starts, the platform estimates total cost by:

  1. Estimating seconds per epoch from dataset size, model complexity, image size, batch size, and GPU speed
  2. Calculating total training time by multiplying seconds per epoch by the number of epochs, then adding startup overhead
  3. Computing the estimated cost from total training hours multiplied by the GPU's hourly rate

Factors affecting cost:

FactorImpact
Dataset SizeMore images = longer training time (baseline: ~2.8s compute per 1000 images on RTX 4090)
Model SizeLarger models (m, l, x) train slower than (n, s)
Number of EpochsDirect multiplier on training time
Image SizeLarger imgsz increases computation: 320px=0.25x, 640px=1.0x (baseline), 1280px=4.0x
Batch SizeLarger batches are more efficient (batch 32 = ~0.85x time, batch 8 = ~1.2x time vs batch 16 baseline)
GPU SpeedFaster GPUs reduce training time (e.g., H100 SXM = ~3.4x faster than RTX 4090)
Startup OverheadUp to 5 minutes for instance initialization, data download, and warmup (scales with dataset size)

Cost Examples

Estimates

Cost estimates are approximate and depend on many factors. The training dialog shows a real-time estimate before you start training.

ScenarioGPUEstimated Cost
500 images, YOLO26n, 50 epochsRTX 4090~$0.50
1000 images, YOLO26n, 100 epochsRTX PRO 6000~$5
5000 images, YOLO26s, 100 epochsH100 SXM~$23

Billing Flow

graph LR
    A[Estimate Cost] --> B[Balance Check]
    B --> C[Train]
    C --> D[Charge Actual Runtime]

    style A fill:#2196F3,color:#fff
    style B fill:#FF9800,color:#fff
    style C fill:#9C27B0,color:#fff
    style D fill:#4CAF50,color:#fff

Cloud training billing flow:

  1. Estimate: Cost calculated before training starts
  2. Balance Check: Available credits are checked before launch
  3. Train: Job runs on selected compute
  4. Charge: Final cost is based on actual runtime

Consumer Protection

Billing tracks actual compute usage, including partial runs that are cancelled.

Payment Methods

MethodDescription
Account BalancePre-loaded credits
Pay Per JobCharge at job completion

Minimum Balance

Training start requires a positive available balance and enough credits for the estimated job cost.

View Training Costs

After training, view detailed costs in the Billing tab:

  • Per-epoch cost breakdown
  • Total GPU time
  • Download cost report

Ultralytics Platform Training Billing Details

Training Tips

Choose the Right Model Size

ModelParametersBest For
YOLO26n2.4MReal-time, edge devices
YOLO26s9.5MBalanced speed/accuracy
YOLO26m20.4MHigher accuracy
YOLO26l24.8MProduction accuracy
YOLO26x55.7MMaximum accuracy

Optimize Training Time

Cost-Saving Strategies

  1. Start small: Test with 10-20 epochs on a budget GPU to verify your dataset and config work
  2. Use appropriate GPU: RTX PRO 6000 handles most workloads well
  3. Validate dataset: Fix labeling issues before spending on training
  4. Monitor early: Cancel training if loss plateaus — you only pay for compute time used

Troubleshooting

IssueSolution
Training stuck at 0%Check dataset format, retry
Out of memoryReduce batch size or use larger GPU
Poor accuracyIncrease epochs, check data quality
Training slowConsider faster GPU
Task mismatch errorEnsure model and dataset tasks match

FAQ

How long does training take?

Training time depends on:

  • Dataset size
  • Model size
  • Number of epochs
  • GPU selected

Typical times (1000 images, 100 epochs):

ModelRTX PRO 6000A100
YOLO26n20 min20 min
YOLO26m40 min40 min
YOLO26x80 min80 min

Can I train overnight?

Yes, training continues until completion. You'll receive a notification when training finishes. Make sure your account has sufficient balance for epoch-based training.

What happens if I run out of credits?

Training pauses at the end of the current epoch. Your checkpoint is saved, and you can resume after adding credits.

Can I use custom training arguments?

Yes, expand the Advanced Settings section in the training dialog to access a YAML editor with 40+ configurable parameters. Non-default values are included in both cloud and local training commands.

Can I train from a dataset page?

Yes, the Train button on dataset pages opens the training dialog with the dataset pre-selected and locked. You then select a project and model to begin training.

Training Parameters Reference

ParameterTypeDefaultRangeDescription
epochsint1001-10000Number of training epochs
batchint161-512Batch size
imgszint64032-4096Input image size
patienceint1001-1000Early stopping patience
seedint00-2147483647Random seed for reproducibility
deterministicboolTrue-Deterministic training mode
ampboolTrue-Automatic mixed precision
close_mosaicint100-50Disable mosaic in final N epochs
save_periodint-1-1-100Save checkpoint every N epochs
workersint80-64Dataloader workers
cacheselectfalseram/disk/falseCache images
ParameterTypeDefaultRangeDescription
lr0float0.010.0001-0.1Initial learning rate
lrffloat0.010.01-1.0Final LR factor
momentumfloat0.9370.6-0.98SGD momentum
weight_decayfloat0.00050.0-0.001L2 regularization
warmup_epochsfloat3.00-5Warmup epochs
warmup_momentumfloat0.80.5-0.95Warmup momentum
warmup_bias_lrfloat0.10.0-0.2Warmup bias LR
cos_lrboolFalse-Cosine LR scheduler
ParameterTypeDefaultRangeDescription
hsv_hfloat0.0150.0-0.1HSV hue augmentation
hsv_sfloat0.70.0-1.0HSV saturation
hsv_vfloat0.40.0-1.0HSV value
degreesfloat0.0-45-45Rotation degrees
translatefloat0.10.0-1.0Translation fraction
scalefloat0.50.0-1.0Scale factor
shearfloat0.0-10-10Shear degrees
perspectivefloat0.00.0-0.001Perspective transform
fliplrfloat0.50.0-1.0Horizontal flip prob
flipudfloat0.00.0-1.0Vertical flip prob
mosaicfloat1.00.0-1.0Mosaic augmentation
mixupfloat0.00.0-1.0Mixup augmentation
copy_pastefloat0.00.0-1.0Copy-paste (segment)
ParameterTypeDefaultRangeDescription
fractionfloat1.00.1-1.0Fraction of dataset to use
freezeintnull0-100Number of layers to freeze
single_clsboolFalse-Treat all classes as one class
rectboolFalse-Rectangular training
multi_scalefloat0.00.0-1.0Multi-scale training range
valboolTrue-Run validation during training
resumeboolFalse-Resume training from checkpoint
ValueDescription
autoAutomatic selection (default)
SGDStochastic Gradient Descent
MuSGDMuon SGD optimizer
AdamAdam optimizer
AdamWAdam with weight decay
NAdamNAdam optimizer
RAdamRAdam optimizer
RMSPropRMSProp optimizer
AdamaxAdamax optimizer
ParameterTypeDefaultRangeDescription
boxfloat7.51-50Box loss weight
clsfloat0.50.2-4Classification loss weight
dflfloat1.50.4-6Distribution focal loss
posefloat12.01-50Pose loss weight (pose only)
kobjfloat1.00.5-10Keypoint objectness (pose)
label_smoothingfloat0.00.0-0.1Label smoothing factor

Task-Specific Parameters

Some parameters only apply to specific tasks:

  • Detection tasks only (detect, segment, pose, OBB — not classify): box, dfl, degrees, translate, shear, perspective, mosaic, mixup, close_mosaic
  • Segment only: copy_paste
  • Pose only: pose (loss weight), kobj (keypoint objectness)


📅 Created 1 month ago ✏️ Updated 5 days ago
glenn-jochersergiuwaxmannLaughing-q

Comments