Semantic Segmentation Datasets Overview
Semantic segmentation assigns one class label to every pixel in an image. Unlike instance segmentation, semantic segmentation does not separate individual objects of the same class. The training target is a dense class map where each pixel stores a class ID.
This guide explains the dataset format used by Ultralytics YOLO semantic segmentation models and lists the built-in dataset configurations available for training and validation.
Supported Dataset Formats
Two label formats are supported. The dataset loader picks the path based on whether the dataset YAML defines a masks_dir key.
PNG mask format
Semantic segmentation datasets use one image file and one mask file per sample. The mask is a single-channel image, usually PNG, where each pixel value is the class index for the corresponding image pixel.
- Pixel values
0,1,2, ... represent class IDs from the datasetnamesmapping. - Pixel value
255is treated as the ignore label and is excluded from loss and metric computation. - Mask files should use the same stem as their matching image file, for example
frankfurt_000000_000294.png. - Supported mask extensions are
.png,.PNG,.bmp, and.tif.
The default layout keeps images and masks in parallel folders. The masks_dir value from the dataset YAML replaces the images path component to find masks.
dataset/
├── images/
│ ├── train/
│ └── val/
└── masks/
├── train/
└── val/For example, an image at images/train/aachen_000000_000019.png is paired with a mask at masks/train/aachen_000000_000019.png when masks_dir: masks.
YOLO polygon label format
If your dataset already has Ultralytics YOLO polygon labels (one .txt per image with <class-index> <x1> <y1> <x2> <y2> ... rows), you can train semantic segmentation directly from them — no PNG mask conversion needed. See the instance segmentation dataset format for the row-level layout.
This path is selected automatically when the dataset YAML omits masks_dir. Behavior:
- Polygons are converted to a per-image semantic mask at load time, sorted by area so smaller objects override larger ones in overlap regions.
- Multi-class (
N > 1innames): an extrabackgroundclass is appended after your declared classes for pixels not covered by any polygon. The model is built withN + 1output channels and the last channel is background. - Single-class (
N == 1innames): still trained as 1 class. The mask is binary, with your declared class shown as1and pixels not covered by any polygon as0. No extra background class is added tonames. - Pixels added by augmentation padding (e.g. random crop) still use
255as the ignore label.
Use this path when your data is already labeled as instance polygons and you want a semantic segmentation model from the same files.
Dataset YAML format
Semantic segmentation datasets are configured with YAML files. The main fields are:
| Key | Description |
|---|---|
path | Dataset root directory. |
train | Training image path relative to path, or an absolute path. |
val | Validation image path relative to path, or an absolute path. |
test | Optional test image path. |
masks_dir | Directory name used for semantic masks. Omit this key to switch to the YOLO polygon label format. |
names | Class ID to class name mapping. |
label_mapping | Optional mapping from source dataset IDs to training IDs or ignore_label. |
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
# Cityscapes semantic segmentation dataset (19 classes)
# Documentation: https://docs.ultralytics.com/datasets/semantic/cityscapes8/
# Example usage: yolo semantic train data=cityscapes8.yaml model=yolo26n-sem.pt
# parent
# ├── ultralytics
# └── datasets
# └── cityscapes8 ← downloads here (small subset)
# └── images
# └── masks
# Dataset root directory
path: cityscapes8 # dataset root dir
train: images/train # train images (relative to 'path') 4 images
val: images/val # val images (relative to 'path') 4 images
masks_dir: masks # semantic mask directory
# Cityscapes 19-class labels
names:
0: road
1: sidewalk
2: building
3: wall
4: fence
5: pole
6: traffic light
7: traffic sign
8: vegetation
9: terrain
10: sky
11: person
12: rider
13: car
14: truck
15: bus
16: train
17: motorcycle
18: bicycle
# Map source label IDs to train IDs; ignore_label is converted to 255.
label_mapping:
-1: ignore_label
0: ignore_label
1: ignore_label
2: ignore_label
3: ignore_label
4: ignore_label
5: ignore_label
6: ignore_label
7: 0
8: 1
9: ignore_label
10: ignore_label
11: 2
12: 3
13: 4
14: ignore_label
15: ignore_label
16: ignore_label
17: 5
18: ignore_label
19: 6
20: 7
21: 8
22: 9
23: 10
24: 11
25: 12
26: 13
27: 14
28: 15
29: ignore_label
30: ignore_label
31: 16
32: 17
33: 18
# Download URL (optional)
download: https://github.com/ultralytics/assets/releases/download/v0.0.0/cityscapes8.zipUse label_mapping when the source mask IDs do not already match contiguous training class IDs. Cityscapes and ADE20K include mappings that convert original label IDs into YOLO semantic segmentation train IDs and ignore unused labels.
Usage
Train a YOLO26 semantic segmentation model with Python or CLI:
from ultralytics import YOLO
# Load a pretrained semantic segmentation model
model = YOLO("yolo26n-sem.pt")
# Train on the Cityscapes8 semantic segmentation dataset
results = model.train(data="cityscapes8.yaml", epochs=100, imgsz=1024)Supported Datasets
Ultralytics provides semantic segmentation dataset YAML files for these datasets:
- Cityscapes: Urban street-scene semantic segmentation dataset with 19 train classes.
- Cityscapes8: An 8-image Cityscapes subset for quick tests and CI checks.
- ADE20K: Scene parsing dataset with 150 semantic classes.
Adding Your Own Dataset
Option A — PNG masks
- Save your images under split folders such as
images/trainandimages/val. - Save one single-channel mask per image under the mirrored mask folders, such as
masks/trainandmasks/val. - Ensure mask pixel values are class IDs. Use
255for pixels that should be ignored. - Create a dataset YAML with
path,train,val,masks_dir, andnames. - Add
label_mappingonly when your mask IDs need conversion to contiguous train IDs.
path: path/to/my-semantic-dataset
train: images/train
val: images/val
masks_dir: masks
names:
0: background
1: road
2: buildingOption B — Polygon labels
- Lay out images and
.txtpolygon files exactly as for instance segmentation. - Create a dataset YAML with
path,train,val, andnames— omitmasks_dir. - Do not add a "background" entry to
names. For multi-class datasets the loader appends one automatically; for single-class datasets training stays at 1 class — your declared class becomes1in the mask and uncovered pixels become0.
path: path/to/my-polygon-dataset
train: images/train
val: images/val
names:
0: person
1: carFAQ
What is the difference between semantic segmentation masks and instance segmentation labels?
Semantic segmentation masks are dense pixel maps. Each pixel stores a class ID, and there is one mask image per training image. Instance segmentation labels in Ultralytics YOLO use text files with polygon coordinates, one row per object instance.
What pixel value is ignored during training?
Pixel value 255 is used as the ignore label. These pixels are skipped during loss and metric computation, which is useful for void regions, unlabeled pixels, or classes outside the training label set.
Do mask file names need to match image file names?
Yes. Each semantic mask should have the same file stem as the corresponding image. The dataset loader replaces the images directory component with masks_dir and searches for matching mask files.
Can I use original dataset label IDs directly?
Yes, if they already match your names class IDs. If the source dataset uses non-contiguous IDs or includes labels that should be ignored, add a label_mapping section to convert source pixel values to training IDs.
Can I use my instance segmentation dataset to train semantic segmentation?
Yes. Instance segmentation datasets use Ultralytics YOLO polygon labels (one .txt per image with <class-index> <x1> <y1> <x2> <y2> ... rows), and the same files can be reused for semantic segmentation — just omit masks_dir from the dataset YAML. The loader converts polygons to per-image masks on the fly. For multi-class datasets (N > 1) an extra background class is appended and the model is built with N + 1 output channels. For single-class datasets (N == 1) training stays at 1 class — the mask shows your declared class as 1 and uncovered pixels as 0.