Datasets
Ultralytics Platform datasets provide a streamlined solution for managing your training data. Once uploaded, datasets can be immediately used for model training, with automatic processing and statistics generation.
Watch: Upload Datasets to Ultralytics Platform
Upload Dataset
Ultralytics Platform accepts multiple upload formats for flexibility:
| Format | Description |
|---|---|
| Images | Individual image files (JPG, PNG, WebP, TIFF, RAW) |
| ZIP Archive | Compressed folder with images and optional labels |
| Video | MP4, AVI files - frames extracted at ~1 fps |
| YOLO Format | Standard YOLO directory structure with labels |
Video Frame Extraction
When uploading videos, frames are automatically extracted:
- Frame rate: ~1 frame per second
- Maximum frames: 100 frames per video
- Processing: Client-side extraction before upload
- Format: Frames converted to standard image format
This is ideal for creating training datasets from surveillance footage, action recordings, or any video source.
Preparing Your Dataset
For labeled datasets, use the standard YOLO format:
my-dataset/
โโโ images/
โ โโโ train/
โ โ โโโ img001.jpg
โ โ โโโ img002.jpg
โ โโโ val/
โ โโโ img003.jpg
โ โโโ img004.jpg
โโโ labels/
โ โโโ train/
โ โ โโโ img001.txt
โ โ โโโ img002.txt
โ โโโ val/
โ โโโ img003.txt
โ โโโ img004.txt
โโโ data.yaml
The YAML file defines your dataset configuration:
# data.yaml
path: .
train: images/train
val: images/val
names:
0: person
1: car
2: dog
Upload Process
- Navigate to Datasets in the sidebar
- Click Upload Dataset or drag files into the upload zone
- Select the task type (detect, segment, pose, OBB, classify)
- Add a name and optional description
- Click Upload
After upload, the Platform processes your data:
- Normalization: Large images resized (max 4096px)
- Thumbnails: 256px previews generated
- Label Parsing: YOLO format labels extracted
- Statistics: Class distributions computed
Validate Before Upload
You can validate your dataset locally before uploading:
from ultralytics.hub import check_dataset
check_dataset("path/to/dataset.zip", task="detect")
Browse Images
View your dataset images in multiple layouts:
| View | Description |
|---|---|
| Grid | Thumbnail grid with annotation overlays |
| Compact | Smaller thumbnails for quick scanning |
| Table | List with filename, dimensions, and label counts |
Fullscreen Viewer
Click any image to open the fullscreen viewer with:
- Navigation: Arrow keys or click to browse
- Metadata: Filename, dimensions, split, label count
- Annotations: Toggle annotation visibility
- Class Breakdown: Per-class label counts
Filter by Split
Filter images by their dataset split:
| Split | Purpose |
|---|---|
| Train | Used for model training |
| Val | Used for validation during training |
| Test | Used for final evaluation |
| Unknown | No split assigned |
Dataset Statistics
The Statistics tab provides automatic analysis of your dataset:
class Distribution
Bar chart showing the number of annotations per class:
Location Heatmap
Visualization of where annotations appear in images:
Dimension Analysis
Scatter plot of image dimensions (width vs height):
Statistics Caching
Statistics are cached for 5 minutes. Changes to annotations will be reflected after the cache expires.
Export Dataset
Export your dataset in NDJSON format for offline use:
- Open the dataset actions menu
- Click Export
- Download the NDJSON file
The NDJSON format stores one JSON object per line:
{"filename": "img001.jpg", "split": "train", "labels": [...]}
{"filename": "img002.jpg", "split": "train", "labels": [...]}
See the Ultralytics NDJSON format documentation for full specification.
Dataset URI
Reference Platform datasets using the ul:// URI format:
ul://username/datasets/dataset-slug
Use this URI to train models from anywhere:
export ULTRALYTICS_API_KEY="your_api_key"
yolo train model=yolo11n.pt data=ul://username/datasets/my-dataset epochs=100
Train Anywhere with Platform Data
The ul:// URI works from any environment:
- Local machine: Train on your hardware, data downloaded automatically
- Google Colab: Access your Platform datasets in notebooks
- Remote servers: Train on cloud VMs with full dataset access
Visibility Settings
Control who can see your dataset:
| Setting | Description |
|---|---|
| Private | Only you can access |
| Public | Anyone can view on Explore page |
To change visibility:
- Open dataset actions menu
- Click Edit
- Toggle visibility setting
- Click Save
Edit Dataset
Update dataset name, description, or visibility:
- Open dataset actions menu
- Click Edit
- Make changes
- Click Save
Delete Dataset
Delete a dataset you no longer need:
- Open dataset actions menu
- Click Delete
- Confirm deletion
Trash and Restore
Deleted datasets are moved to Trash for 30 days. You can restore them from the Trash page in Settings.
Train on Dataset
Start training directly from your dataset:
- Click Train Model on the dataset page
- Select a project or create new
- Configure training parameters
- Start training
See Cloud Training for details.
FAQ
What happens to my data after upload?
Your data is processed and stored in your selected region (US, EU, or AP). Images are:
- Validated for format and size
- Normalized if larger than 4096px (preserving aspect ratio)
- Stored using Content-Addressable Storage (CAS) with SHA-256 hashing
- Thumbnails generated at 256px for fast browsing
- Never shared without your permission
How does storage work?
Ultralytics Platform uses Content-Addressable Storage (CAS) for efficient storage:
- Deduplication: Identical images uploaded by different users are stored only once
- Integrity: SHA-256 hashing ensures data integrity
- Efficiency: Reduces storage costs and speeds up processing
- Regional: Data stays in your selected region (US, EU, or AP)
Can I add images to an existing dataset?
Yes, use the Add Images button on the dataset page to upload additional images. New statistics will be computed automatically.
How do I move images between datasets?
Use the bulk selection feature:
- Select images in the gallery
- Click Move or Copy
- Select destination dataset
What label formats are supported?
Ultralytics Platform supports YOLO format labels:
- Detect:
class_id x_center y_center width height - Segment:
class_id x1 y1 x2 y2 ...(polygon points) - Pose:
class_id x_center y_center width height kp1_x kp1_y kp1_v ... - OBB:
class_id x1 y1 x2 y2 x3 y3 x4 y4
All coordinates are normalized (0-1 range).