Semantic Segmentation Datasets Overview

Semantic segmentation assigns one class label to every pixel in an image. Unlike instance segmentation, semantic segmentation does not separate individual objects of the same class. The training target is a dense class map where each pixel stores a class ID.

This guide explains the dataset format used by Ultralytics YOLO semantic segmentation models and lists the built-in dataset configurations available for training and validation.

Supported Dataset Formats

Two label formats are supported. The dataset loader picks the path based on whether the dataset YAML defines a masks_dir key.

PNG mask format

Semantic segmentation datasets use one image file and one mask file per sample. The mask is a single-channel image, usually PNG, where each pixel value is the class index for the corresponding image pixel.

  • Pixel values 0, 1, 2, ... represent class IDs from the dataset names mapping.
  • Pixel value 255 is treated as the ignore label and is excluded from loss and metric computation.
  • Mask files should use the same stem as their matching image file, for example frankfurt_000000_000294.png.
  • Supported mask extensions are .png, .PNG, .bmp, and .tif.

The default layout keeps images and masks in parallel folders. The masks_dir value from the dataset YAML replaces the images path component to find masks.

dataset/
├── images/
│   ├── train/
│   └── val/
└── masks/
    ├── train/
    └── val/

For example, an image at images/train/aachen_000000_000019.png is paired with a mask at masks/train/aachen_000000_000019.png when masks_dir: masks.

YOLO polygon label format

If your dataset already has Ultralytics YOLO polygon labels (one .txt per image with <class-index> <x1> <y1> <x2> <y2> ... rows), you can train semantic segmentation directly from them — no PNG mask conversion needed. See the instance segmentation dataset format for the row-level layout.

This path is selected automatically when the dataset YAML omits masks_dir. Behavior:

  • Polygons are converted to a per-image semantic mask at load time, sorted by area so smaller objects override larger ones in overlap regions.
  • Multi-class (N > 1 in names): an extra background class is appended after your declared classes for pixels not covered by any polygon. The model is built with N + 1 output channels and the last channel is background.
  • Single-class (N == 1 in names): still trained as 1 class. The mask is binary, with your declared class shown as 1 and pixels not covered by any polygon as 0. No extra background class is added to names.
  • Pixels added by augmentation padding (e.g. random crop) still use 255 as the ignore label.

Use this path when your data is already labeled as instance polygons and you want a semantic segmentation model from the same files.

Dataset YAML format

Semantic segmentation datasets are configured with YAML files. The main fields are:

KeyDescription
pathDataset root directory.
trainTraining image path relative to path, or an absolute path.
valValidation image path relative to path, or an absolute path.
testOptional test image path.
masks_dirDirectory name used for semantic masks. Omit this key to switch to the YOLO polygon label format.
namesClass ID to class name mapping.
label_mappingOptional mapping from source dataset IDs to training IDs or ignore_label.
ultralytics/cfg/datasets/cityscapes8.yaml
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Cityscapes semantic segmentation dataset (19 classes)
# Documentation: https://docs.ultralytics.com/datasets/semantic/cityscapes8/
# Example usage: yolo semantic train data=cityscapes8.yaml model=yolo26n-sem.pt
# parent
# ├── ultralytics
# └── datasets
#     └── cityscapes8 ← downloads here (small subset)
#         └── images
#         └── masks

# Dataset root directory
path: cityscapes8 # dataset root dir
train: images/train # train images (relative to 'path') 4 images
val: images/val # val images (relative to 'path') 4 images

masks_dir: masks # semantic mask directory

# Cityscapes 19-class labels
names:
  0: road
  1: sidewalk
  2: building
  3: wall
  4: fence
  5: pole
  6: traffic light
  7: traffic sign
  8: vegetation
  9: terrain
  10: sky
  11: person
  12: rider
  13: car
  14: truck
  15: bus
  16: train
  17: motorcycle
  18: bicycle

# Map source label IDs to train IDs; ignore_label is converted to 255.
label_mapping:
  -1: ignore_label
  0: ignore_label
  1: ignore_label
  2: ignore_label
  3: ignore_label
  4: ignore_label
  5: ignore_label
  6: ignore_label
  7: 0
  8: 1
  9: ignore_label
  10: ignore_label
  11: 2
  12: 3
  13: 4
  14: ignore_label
  15: ignore_label
  16: ignore_label
  17: 5
  18: ignore_label
  19: 6
  20: 7
  21: 8
  22: 9
  23: 10
  24: 11
  25: 12
  26: 13
  27: 14
  28: 15
  29: ignore_label
  30: ignore_label
  31: 16
  32: 17
  33: 18

# Download URL (optional)
download: https://github.com/ultralytics/assets/releases/download/v0.0.0/cityscapes8.zip

Use label_mapping when the source mask IDs do not already match contiguous training class IDs. Cityscapes and ADE20K include mappings that convert original label IDs into YOLO semantic segmentation train IDs and ignore unused labels.

Usage

Train a YOLO26 semantic segmentation model with Python or CLI:

Example
from ultralytics import YOLO

# Load a pretrained semantic segmentation model
model = YOLO("yolo26n-sem.pt")

# Train on the Cityscapes8 semantic segmentation dataset
results = model.train(data="cityscapes8.yaml", epochs=100, imgsz=1024)

Supported Datasets

Ultralytics provides semantic segmentation dataset YAML files for these datasets:

  • Cityscapes: Urban street-scene semantic segmentation dataset with 19 train classes.
  • Cityscapes8: An 8-image Cityscapes subset for quick tests and CI checks.
  • ADE20K: Scene parsing dataset with 150 semantic classes.

Adding Your Own Dataset

Option A — PNG masks

  1. Save your images under split folders such as images/train and images/val.
  2. Save one single-channel mask per image under the mirrored mask folders, such as masks/train and masks/val.
  3. Ensure mask pixel values are class IDs. Use 255 for pixels that should be ignored.
  4. Create a dataset YAML with path, train, val, masks_dir, and names.
  5. Add label_mapping only when your mask IDs need conversion to contiguous train IDs.
path: path/to/my-semantic-dataset
train: images/train
val: images/val
masks_dir: masks

names:
    0: background
    1: road
    2: building

Option B — Polygon labels

  1. Lay out images and .txt polygon files exactly as for instance segmentation.
  2. Create a dataset YAML with path, train, val, and namesomit masks_dir.
  3. Do not add a "background" entry to names. For multi-class datasets the loader appends one automatically; for single-class datasets training stays at 1 class — your declared class becomes 1 in the mask and uncovered pixels become 0.
path: path/to/my-polygon-dataset
train: images/train
val: images/val

names:
    0: person
    1: car

FAQ

What is the difference between semantic segmentation masks and instance segmentation labels?

Semantic segmentation masks are dense pixel maps. Each pixel stores a class ID, and there is one mask image per training image. Instance segmentation labels in Ultralytics YOLO use text files with polygon coordinates, one row per object instance.

What pixel value is ignored during training?

Pixel value 255 is used as the ignore label. These pixels are skipped during loss and metric computation, which is useful for void regions, unlabeled pixels, or classes outside the training label set.

Do mask file names need to match image file names?

Yes. Each semantic mask should have the same file stem as the corresponding image. The dataset loader replaces the images directory component with masks_dir and searches for matching mask files.

Can I use original dataset label IDs directly?

Yes, if they already match your names class IDs. If the source dataset uses non-contiguous IDs or includes labels that should be ignored, add a label_mapping section to convert source pixel values to training IDs.

Can I use my instance segmentation dataset to train semantic segmentation?

Yes. Instance segmentation datasets use Ultralytics YOLO polygon labels (one .txt per image with <class-index> <x1> <y1> <x2> <y2> ... rows), and the same files can be reused for semantic segmentation — just omit masks_dir from the dataset YAML. The loader converts polygons to per-image masks on the fly. For multi-class datasets (N > 1) an extra background class is appended and the model is built with N + 1 output channels. For single-class datasets (N == 1) training stays at 1 class — the mask shows your declared class as 1 and uncovered pixels as 0.

Comments