Cityscapes Dataset
The Cityscapes dataset is a large-scale semantic segmentation benchmark focused on urban street scenes captured across 50 European cities. It provides high-quality pixel-level annotations and is one of the most widely used datasets for autonomous driving research and urban scene understanding with Ultralytics YOLO models.
Key Features
- Cityscapes fine annotations include 2,975 training images, 500 validation images, and 1,525 test images.
- The dataset covers 19 evaluation classes spanning road, vehicle, human, construction, object, nature, and sky categories.
- Cityscapes provides standardized evaluation metrics like mean Intersection over Union (mIoU) for semantic segmentation, enabling effective comparison of model performance.
Dataset Structure
The Ultralytics configuration expects the following layout after preparation:
cityscapes/
├── images/
│ ├── train/
│ ├── val/
│ └── test/
└── masks/
├── train/
├── val/
└── test/The semantic masks are single-channel PNG files. The original Cityscapes label IDs are mapped to the standard 19 train IDs via the label_mapping section, and ignored or void labels are mapped to 255 so they are excluded from training and evaluation. Download the official leftImg8bit and gtFine archives from the Cityscapes website and extract them into the dataset root; the preparation block in cityscapes.yaml then organizes images and masks into this layout.
Applications
Cityscapes is widely used for training and evaluating deep learning models in semantic segmentation, particularly for autonomous driving, advanced driver-assistance systems (ADAS), and urban robotics.
Its high-resolution images and detailed annotations also make it valuable for research on real-time scene parsing, lane and obstacle understanding, and any task that requires dense pixel-level understanding of complex urban environments.
Dataset YAML
A dataset YAML file defines the Cityscapes paths, classes, mask directory, and label mapping. The cityscapes.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/cityscapes.yaml.
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
# Cityscapes semantic segmentation dataset (19 classes)
# Documentation: https://docs.ultralytics.com/datasets/semantic/cityscapes/
# Example usage: yolo semantic train data=cityscapes.yaml model=yolo26n-sem.pt
# parent
# ├── ultralytics
# └── datasets
# └── cityscapes ← downloads here (11 GB)
# └── images
# └── masks
# Dataset root directory
path: cityscapes # dataset root dir
train: images/train # train images (relative to 'path') 2975 images
val: images/val # val images (relative to 'path') 500 images
test: images/test # test images (relative to 'path') 1525 images
masks_dir: masks # semantic mask directory
# Cityscapes 19-class labels
names:
0: road
1: sidewalk
2: building
3: wall
4: fence
5: pole
6: traffic light
7: traffic sign
8: vegetation
9: terrain
10: sky
11: person
12: rider
13: car
14: truck
15: bus
16: train
17: motorcycle
18: bicycle
# Map source label IDs to train IDs; ignore_label is converted to 255.
label_mapping:
-1: ignore_label
0: ignore_label
1: ignore_label
2: ignore_label
3: ignore_label
4: ignore_label
5: ignore_label
6: ignore_label
7: 0
8: 1
9: ignore_label
10: ignore_label
11: 2
12: 3
13: 4
14: ignore_label
15: ignore_label
16: ignore_label
17: 5
18: ignore_label
19: 6
20: 7
21: 8
22: 9
23: 10
24: 11
25: 12
26: 13
27: 14
28: 15
29: ignore_label
30: ignore_label
31: 16
32: 17
33: 18
# Preparation script (requires manual Cityscapes download)
download: |
from pathlib import Path
from shutil import copy2
cityscapes_dir = Path(yaml["path"]) # dataset root dir
# Download and extract the official Cityscapes leftImg8bit and gtFine archives into cityscapes_dir first.
leftimg8bit_dir = cityscapes_dir / "leftImg8bit"
gtfine_dir = cityscapes_dir / "gtFine"
for split in ("train", "val", "test"):
print(f"Processing {split} set")
src_image_dir = leftimg8bit_dir / split
dst_image_dir = cityscapes_dir / "images" / split
dst_mask_dir = cityscapes_dir / "masks" / split
dst_image_dir.mkdir(parents=True, exist_ok=True)
dst_mask_dir.mkdir(parents=True, exist_ok=True)
image_paths = sorted(src_image_dir.rglob("*_leftImg8bit.png"))
for image_path in image_paths:
relative_path = image_path.relative_to(src_image_dir)
mask_path = gtfine_dir / split / relative_path.parent / image_path.name.replace(
"_leftImg8bit.png", "_gtFine_labelIds.png"
)
if not mask_path.exists():
raise FileNotFoundError(f"Mask not found for {image_path}: {mask_path}")
image_name = image_path.name.replace("_leftImg8bit", "")
mask_name = mask_path.name.replace("_gtFine_labelIds", "")
copy2(image_path, dst_image_dir / image_name)
copy2(mask_path, dst_mask_dir / mask_name)Usage
To train a YOLO26n-sem model on the Cityscapes dataset for 100 epochs with an image size of 1024, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.
from ultralytics import YOLO
# Load a model
model = YOLO("yolo26n-sem.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="cityscapes.yaml", epochs=100, imgsz=1024)Citations and Acknowledgments
If you use the Cityscapes dataset in your research or development work, please cite the following paper:
@inproceedings{Cordts2016Cityscapes,
title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2016}
}We would like to acknowledge the Cityscapes team for creating and maintaining this valuable resource for the autonomous driving and computer vision communities. For more information about the Cityscapes dataset and its creators, visit the Cityscapes dataset website.
FAQ
What is the Cityscapes dataset and why is it important for computer vision?
The Cityscapes dataset is a large-scale semantic segmentation benchmark focused on urban street scenes captured across 50 European cities. It contains 5,000 finely annotated images across 19 evaluation classes, making it a foundational resource for autonomous driving and urban scene understanding research. Its high-resolution images, dense annotations, and standardized mean Intersection over Union (mIoU) metric make it ideal for benchmarking dense prediction models.
How can I train a YOLO model using the Cityscapes dataset?
To train a YOLO26n-sem model on the Cityscapes dataset for 100 epochs with an image size of 1024, you can use the following code snippets. For a detailed list of available arguments, refer to the model Training page.
from ultralytics import YOLO
# Load a model
model = YOLO("yolo26n-sem.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="cityscapes.yaml", epochs=100, imgsz=1024)How is the Cityscapes dataset structured?
After preparation, the dataset is organized into images/{train,val,test}/ and masks/{train,val,test}/ directories, with each image paired with a single-channel PNG mask. The Ultralytics YAML file pairs each image with its mask via the masks_dir: masks field, and uses label_mapping to convert original Cityscapes label IDs into the standard 19 contiguous train IDs, mapping ignored and void labels to 255.
Do I need to download Cityscapes manually?
Yes. Cityscapes requires accepting the dataset terms on the official website. Download and extract leftImg8bit and gtFine into the cityscapes dataset root before using the preparation block in cityscapes.yaml to create the expected images/ and masks/ layout.
Why does Cityscapes use label_mapping?
Cityscapes source masks store original label IDs that differ from the 19 train IDs used for evaluation. The label_mapping section converts valid labels to contiguous class IDs 0–18, and assigns 255 to ignored and void labels so they are excluded from the loss and metrics during training and validation.