Skip to content

Model Training

Ultralytics Platform provides comprehensive tools for training YOLO models, from organizing experiments to running cloud training jobs with real-time metrics streaming.

Overview

The Training section helps you:

  • Organize models into projects for easier management
  • Train on cloud GPUs with a single click
  • Monitor real-time metrics during training
  • Compare model performance across experiments

Workflow

graph LR
    A[📁 Project] --> B[⚙️ Configure]
    B --> C[🚀 Train]
    C --> D[📈 Monitor]
    D --> E[📦 Export]

    style A fill:#4CAF50,color:#fff
    style B fill:#2196F3,color:#fff
    style C fill:#FF9800,color:#fff
    style D fill:#9C27B0,color:#fff
    style E fill:#00BCD4,color:#fff
StageDescription
ProjectCreate a workspace to organize related models
ConfigureSelect dataset, base model, and training parameters
TrainRun on cloud GPUs or your local hardware
MonitorView real-time loss curves and metrics
ExportConvert to 17 deployment formats

Training Options

Ultralytics Platform supports multiple training approaches:

MethodDescriptionBest For
Cloud TrainingTrain on Platform cloud GPUsNo local GPU, scalability
Remote TrainingTrain locally, stream metrics to PlatformExisting hardware, privacy
Colab TrainingUse Google Colab with Platform integrationFree GPU access

GPU Options

Available GPUs for cloud training:

GPUVRAMPerformanceCost
RTX 309024GBGood$0.44/hr
RTX 409024GBExcellent$0.74/hr
L40S48GBVery Good$1.14/hr
A100 40GB40GBExcellent$1.29/hr
A100 80GB80GBExcellent$1.99/hr
H100 80GB80GBBest$3.99/hr

Free Training

New accounts receive credits for training. Check Billing for details.

Real-Time Metrics

During training, view live metrics:

  • Loss Curves: Box, class, and DFL loss
  • Performance: mAP50, mAP50-95, precision, recall
  • System Stats: GPU utilization, memory usage
  • Checkpoints: Automatic saving of best weights

FAQ

How long does training take?

Training time depends on:

  • Dataset size (number of images)
  • Model size (n, s, m, l, x)
  • Number of epochs
  • GPU type selected

A typical training run with 1000 images, YOLO11n, 100 epochs on RTX 4090 takes about 30-60 minutes.

Can I train multiple models simultaneously?

Cloud training currently supports one concurrent training job per account. For parallel training, use remote training from multiple machines.

What happens if training fails?

If training fails:

  1. Checkpoints are saved at each epoch
  2. You can resume from the last checkpoint
  3. Credits are only charged for completed compute time

How do I choose the right GPU?

ScenarioRecommended GPU
Small datasets (<5000 images)RTX 4090
Medium datasets (5000-50000 images)A100 40GB
Large datasets or batch sizesA100 80GB or H100
Budget-consciousRTX 3090


📅 Created 0 days ago ✏️ Updated 0 days ago
glenn-jocher

Comments