Skip to content

Data Preparation

Data preparation is the foundation of successful computer vision models. Ultralytics Platform provides comprehensive tools for managing your training data, from upload through annotation to analysis.

Overview

The Data section of Ultralytics Platform helps you:

  • Upload images, videos, and archives (ZIP, TAR, GZ)
  • Annotate with manual drawing tools and SAM-powered smart labeling
  • Analyze your data with statistics and visualizations
  • Export in NDJSON format for local training

Ultralytics Platform Data Overview Sidebar Datasets

Workflow

graph LR
    A[Upload] --> B[Annotate]
    B --> C[Analyze]
    C --> D[Train]

    style A fill:#4CAF50,color:#fff
    style B fill:#2196F3,color:#fff
    style C fill:#FF9800,color:#fff
    style D fill:#9C27B0,color:#fff
StageDescription
UploadImport images, videos, or archives with automatic processing
AnnotateLabel data with bounding boxes, polygons, keypoints, or classifications
AnalyzeView class distributions, spatial heatmaps, and dimension statistics
ExportDownload in NDJSON format for offline use

Supported Tasks

Ultralytics Platform supports all 5 YOLO task types:

TaskDescriptionAnnotation Tool
DetectObject detection with bounding boxesRectangle tool
SegmentInstance segmentation with pixel masksPolygon tool
PoseKeypoint estimation (17-point COCO format)Keypoint tool
OBBOriented bounding boxes for rotated objectsOriented box tool
ClassifyImage-level classificationClass selector

Task Type Selection

The task type is set when creating a dataset and determines which annotation tools are available. You can change it later from the dataset settings, but incompatible annotations won't be displayed after switching.

Key Features

Smart Storage

Ultralytics Platform uses Content-Addressable Storage (CAS) for efficient data management:

  • Deduplication: Identical images stored only once via XXH3-128 hashing
  • Integrity: Hash-based addressing ensures data integrity
  • Efficiency: Optimized storage and fast processing

Dataset URIs

Reference datasets using the ul:// URI format (see Using Platform Datasets):

yolo train data=ul://username/datasets/my-dataset

This allows training on the platform's datasets from any machine with your API key configured.

Use Platform Data from Python

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(data="ul://username/datasets/my-dataset", epochs=100)

Dataset Tabs

Every dataset page provides five tabs:

TabDescription
ImagesBrowse images in grid, compact, or table view with annotation overlays
ClassesView and edit class names, colors, and label counts per class
ChartsAutomatic statistics: split distribution, class counts, heatmaps
ModelsModels trained on this dataset with metrics and status
ErrorsImages that failed processing with error details and fix guidance

Statistics and Visualization

The Charts tab provides automatic analysis including:

  • Split Distribution: Donut chart of train/val/test image counts
  • Top Classes: Donut chart of most frequent annotation classes
  • Image Widths: Histogram of image width distribution
  • Image Heights: Histogram of image height distribution
  • Points per Instance: Polygon vertex or keypoint count distribution (segment/pose datasets)
  • Annotation Locations: 2D heatmap of bounding box center positions
  • Image Dimensions: 2D heatmap of width vs height with aspect ratio guide lines
  • Datasets: Upload and manage your training data
  • Annotation: Label data with manual and AI-assisted tools

FAQ

What file formats are supported for upload?

Ultralytics Platform supports:

Images: JPEG, PNG, WebP, BMP, TIFF, HEIC, AVIF, JP2, DNG, MPO (max 50MB each)

Videos: MP4, WebM, MOV, AVI, MKV, M4V (max 1GB, frames extracted at 1 FPS, max 100 frames)

Archives: ZIP, TAR, TAR.GZ, TGZ, GZ (max 10GB) containing images with optional YOLO-format labels

What is the maximum dataset size?

Storage limits depend on your plan:

PlanStorage Limit
Free100 GB
Pro500 GB
EnterpriseCustom

Individual file limits: Images 50MB, Videos 1GB, Archives 10GB

Can I use my Platform datasets for local training?

Yes! Use the dataset URI format to train locally:

export ULTRALYTICS_API_KEY="your_key"
yolo train model=yolo26n.pt data=ul://username/datasets/my-dataset epochs=100
import os

os.environ["ULTRALYTICS_API_KEY"] = "your_key"

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(data="ul://username/datasets/my-dataset", epochs=100)

Or export your dataset in NDJSON format for fully offline training.



📅 Created 1 month ago ✏️ Updated 5 days ago
glenn-jochersergiuwaxmann

Comments