Skip to content

Datasets

Ultralytics Platform datasets provide a streamlined solution for managing your training data. Once uploaded, datasets can be immediately used for model training, with automatic processing and statistics generation.


Watch: Upload Datasets to Ultralytics Platform

Upload Dataset

Ultralytics Platform accepts multiple upload formats for flexibility:

FormatDescription
ImagesIndividual image files (JPG, PNG, WebP, TIFF, RAW)
ZIP ArchiveCompressed folder with images and optional labels
VideoMP4, AVI files - frames extracted at ~1 fps
YOLO FormatStandard YOLO directory structure with labels

Video Frame Extraction

When uploading videos, frames are automatically extracted:

  • Frame rate: ~1 frame per second
  • Maximum frames: 100 frames per video
  • Processing: Client-side extraction before upload
  • Format: Frames converted to standard image format

This is ideal for creating training datasets from surveillance footage, action recordings, or any video source.

Preparing Your Dataset

For labeled datasets, use the standard YOLO format:

my-dataset/
โ”œโ”€โ”€ images/
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ img001.jpg
โ”‚   โ”‚   โ””โ”€โ”€ img002.jpg
โ”‚   โ””โ”€โ”€ val/
โ”‚       โ”œโ”€โ”€ img003.jpg
โ”‚       โ””โ”€โ”€ img004.jpg
โ”œโ”€โ”€ labels/
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ img001.txt
โ”‚   โ”‚   โ””โ”€โ”€ img002.txt
โ”‚   โ””โ”€โ”€ val/
โ”‚       โ”œโ”€โ”€ img003.txt
โ”‚       โ””โ”€โ”€ img004.txt
โ””โ”€โ”€ data.yaml

The YAML file defines your dataset configuration:

# data.yaml
path: .
train: images/train
val: images/val

names:
    0: person
    1: car
    2: dog

Upload Process

  1. Navigate to Datasets in the sidebar
  2. Click Upload Dataset or drag files into the upload zone
  3. Select the task type (detect, segment, pose, OBB, classify)
  4. Add a name and optional description
  5. Click Upload

After upload, the Platform processes your data:

  1. Normalization: Large images resized (max 4096px)
  2. Thumbnails: 256px previews generated
  3. Label Parsing: YOLO format labels extracted
  4. Statistics: Class distributions computed
Validate Before Upload

You can validate your dataset locally before uploading:

from ultralytics.hub import check_dataset

check_dataset("path/to/dataset.zip", task="detect")

Browse Images

View your dataset images in multiple layouts:

ViewDescription
GridThumbnail grid with annotation overlays
CompactSmaller thumbnails for quick scanning
TableList with filename, dimensions, and label counts

Fullscreen Viewer

Click any image to open the fullscreen viewer with:

  • Navigation: Arrow keys or click to browse
  • Metadata: Filename, dimensions, split, label count
  • Annotations: Toggle annotation visibility
  • Class Breakdown: Per-class label counts

Filter by Split

Filter images by their dataset split:

SplitPurpose
TrainUsed for model training
ValUsed for validation during training
TestUsed for final evaluation
UnknownNo split assigned

Dataset Statistics

The Statistics tab provides automatic analysis of your dataset:

class Distribution

Bar chart showing the number of annotations per class:

Location Heatmap

Visualization of where annotations appear in images:

Dimension Analysis

Scatter plot of image dimensions (width vs height):

Statistics Caching

Statistics are cached for 5 minutes. Changes to annotations will be reflected after the cache expires.

Export Dataset

Export your dataset in NDJSON format for offline use:

  1. Open the dataset actions menu
  2. Click Export
  3. Download the NDJSON file

The NDJSON format stores one JSON object per line:

{"filename": "img001.jpg", "split": "train", "labels": [...]}
{"filename": "img002.jpg", "split": "train", "labels": [...]}

See the Ultralytics NDJSON format documentation for full specification.

Dataset URI

Reference Platform datasets using the ul:// URI format:

ul://username/datasets/dataset-slug

Use this URI to train models from anywhere:

export ULTRALYTICS_API_KEY="your_api_key"
yolo train model=yolo11n.pt data=ul://username/datasets/my-dataset epochs=100

Train Anywhere with Platform Data

The ul:// URI works from any environment:

  • Local machine: Train on your hardware, data downloaded automatically
  • Google Colab: Access your Platform datasets in notebooks
  • Remote servers: Train on cloud VMs with full dataset access

Visibility Settings

Control who can see your dataset:

SettingDescription
PrivateOnly you can access
PublicAnyone can view on Explore page

To change visibility:

  1. Open dataset actions menu
  2. Click Edit
  3. Toggle visibility setting
  4. Click Save

Edit Dataset

Update dataset name, description, or visibility:

  1. Open dataset actions menu
  2. Click Edit
  3. Make changes
  4. Click Save

Delete Dataset

Delete a dataset you no longer need:

  1. Open dataset actions menu
  2. Click Delete
  3. Confirm deletion

Trash and Restore

Deleted datasets are moved to Trash for 30 days. You can restore them from the Trash page in Settings.

Train on Dataset

Start training directly from your dataset:

  1. Click Train Model on the dataset page
  2. Select a project or create new
  3. Configure training parameters
  4. Start training

See Cloud Training for details.

FAQ

What happens to my data after upload?

Your data is processed and stored in your selected region (US, EU, or AP). Images are:

  1. Validated for format and size
  2. Normalized if larger than 4096px (preserving aspect ratio)
  3. Stored using Content-Addressable Storage (CAS) with SHA-256 hashing
  4. Thumbnails generated at 256px for fast browsing
  5. Never shared without your permission

How does storage work?

Ultralytics Platform uses Content-Addressable Storage (CAS) for efficient storage:

  • Deduplication: Identical images uploaded by different users are stored only once
  • Integrity: SHA-256 hashing ensures data integrity
  • Efficiency: Reduces storage costs and speeds up processing
  • Regional: Data stays in your selected region (US, EU, or AP)

Can I add images to an existing dataset?

Yes, use the Add Images button on the dataset page to upload additional images. New statistics will be computed automatically.

How do I move images between datasets?

Use the bulk selection feature:

  1. Select images in the gallery
  2. Click Move or Copy
  3. Select destination dataset

What label formats are supported?

Ultralytics Platform supports YOLO format labels:

  • Detect: class_id x_center y_center width height
  • Segment: class_id x1 y1 x2 y2 ... (polygon points)
  • Pose: class_id x_center y_center width height kp1_x kp1_y kp1_v ...
  • OBB: class_id x1 y1 x2 y2 x3 y3 x4 y4

All coordinates are normalized (0-1 range).



๐Ÿ“… Created 0 days ago โœ๏ธ Updated 0 days ago
glenn-jocher

Comments