How to Convert COCO Annotations to YOLO Format
Training Ultralytics YOLO models requires annotations in YOLO format, but many popular annotation tools export in COCO JSON format instead. This guide shows you how to convert your COCO annotations to YOLO format and start training object detection, instance segmentation, and pose estimation models.
Why Convert from COCO to YOLO?
The COCO JSON format stores all annotations in a single file, while YOLO uses one text file per image with normalized coordinates. Converting is necessary because:
- YOLO models require
.txtlabel files with one file per image, containingclass x_center y_center width heightin normalized coordinates. - COCO JSON uses pixel coordinates in
[x_min, y_min, width, height]format with a single JSON file for all images. - Class IDs differ โ COCO uses arbitrary
category_idvalues, while YOLO requires zero-indexed class IDs.
| Feature | COCO JSON | YOLO TXT |
|---|---|---|
| Structure | Single JSON file for all images | One .txt file per image |
| Bbox format | [x_min, y_min, width, height] in pixels | class x_center y_center width height normalized (0-1) |
| Class IDs | category_id (can start from any number) | Zero-indexed (starts from 0) |
| Segmentation | Polygon arrays in segmentation field | Polygon coordinates after class ID |
| Keypoints | [x, y, visibility, ...] in pixels | [x, y, visibility, ...] normalized |
Quick Start
The fastest way to convert COCO annotations and start training:
from ultralytics.data.converter import convert_coco
convert_coco(
labels_dir="path/to/annotations/", # directory containing your JSON files
save_dir="path/to/output/", # where to save converted labels
cls91to80=False, # IMPORTANT: set False for custom datasets
)
After conversion, organize your directory structure, create a dataset.yaml, and start training. See the full step-by-step guide below.
Custom datasets: always use cls91to80=False
The cls91to80=True default is designed only for the standard COCO dataset with 80 object classes, which maps 91 non-contiguous category IDs to 80 contiguous class IDs. For any custom dataset, you must set cls91to80=False โ otherwise your class IDs will be silently mapped incorrectly and your model will learn wrong classes.
Step-by-Step Conversion Guide
1. Prepare Your COCO Dataset
A typical COCO-format dataset exported from annotation tools has the following structure:
my_dataset/
โโโ images/
โ โโโ train/
โ โ โโโ img_001.jpg
โ โ โโโ img_002.jpg
โ โ โโโ ...
โ โโโ val/
โ โโโ img_100.jpg
โ โโโ ...
โโโ annotations/
โโโ instances_train.json
โโโ instances_val.json
Each JSON file follows the COCO data format specification with three required fields โ images, annotations, and categories:
{
"images": [{ "id": 1, "file_name": "img_001.jpg", "width": 640, "height": 480 }],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1,
"bbox": [100, 50, 200, 150],
"area": 30000,
"iscrowd": 0
}
],
"categories": [
{ "id": 1, "name": "helmet" },
{ "id": 2, "name": "vest" }
]
}
2. Convert Annotations
Use the convert_coco() function to convert your COCO JSON annotations to YOLO .txt format:
Convert COCO to YOLO format
from ultralytics.data.converter import convert_coco
convert_coco(
labels_dir="my_dataset/annotations/",
save_dir="my_dataset/converted/",
cls91to80=False,
)
from ultralytics.data.converter import convert_coco
convert_coco(
labels_dir="my_dataset/annotations/",
save_dir="my_dataset/converted/",
use_segments=True,
cls91to80=False,
)
from ultralytics.data.converter import convert_coco
convert_coco(
labels_dir="my_dataset/annotations/",
save_dir="my_dataset/converted/",
use_keypoints=True,
cls91to80=False,
)
3. Organize Directory Structure
After conversion, label files need to be placed alongside your images. YOLO expects a labels/ directory that mirrors the images/ directory:
import shutil
from pathlib import Path
# Paths
converted_dir = Path("my_dataset/converted/labels")
dataset_dir = Path("my_dataset")
# Move labels next to images for each split
for split in ["train", "val"]:
src = converted_dir / split # convert_coco strips "instances_" prefix from JSON filename
dst = dataset_dir / "labels" / split
dst.mkdir(parents=True, exist_ok=True)
for f in src.glob("*.txt"):
shutil.move(str(f), str(dst / f.name))
Your final dataset structure should look like:
my_dataset/
โโโ images/
โ โโโ train/
โ โ โโโ img_001.jpg
โ โ โโโ ...
โ โโโ val/
โ โโโ ...
โโโ labels/
โ โโโ train/
โ โ โโโ img_001.txt
โ โ โโโ ...
โ โโโ val/
โ โโโ ...
โโโ dataset.yaml
4. Create dataset.yaml
Create a dataset.yaml configuration file that maps your COCO categories to YOLO class names. This file tells YOLO where your data is and what classes to detect:
import json
from pathlib import Path
import yaml
# Read categories from your COCO JSON
with open("my_dataset/annotations/instances_train.json") as f:
coco = json.load(f)
# Build class names matching convert_coco output (category_id - 1)
categories = sorted(coco["categories"], key=lambda x: x["id"])
names = {cat["id"] - 1: cat["name"] for cat in categories}
# NOTE: convert_coco maps class IDs as category_id - 1, so category_id must
# start from 1. If your categories start from 0, add 1 to each ID first.
# Create dataset.yaml
dataset = {
"path": str(Path("my_dataset").resolve()),
"train": "images/train",
"val": "images/val",
"names": names,
}
with open("my_dataset/dataset.yaml", "w") as f:
yaml.dump(dataset, f, default_flow_style=False)
The resulting YAML file:
path: /absolute/path/to/my_dataset
train: images/train
val: images/val
names:
0: helmet
1: vest
For more details on the dataset YAML format, see the dataset configuration guide.
5. Train Your YOLO Model
With your converted dataset ready, train a YOLO model:
Train on converted COCO data
from ultralytics import YOLO
model = YOLO("yolo26n.pt") # load a pretrained model
results = model.train(data="my_dataset/dataset.yaml", epochs=100, imgsz=640)
yolo detect train model=yolo26n.pt data=my_dataset/dataset.yaml epochs=100 imgsz=640
For training tips and best practices, see the model training guide.
6. Verify Your Conversion
Before training, spot-check a few label files to confirm class IDs and coordinates are correct:
from pathlib import Path
label_file = Path("my_dataset/labels/train/img_001.txt")
for line in label_file.read_text().strip().splitlines():
parts = line.split()
cls_id = int(parts[0])
coords = [float(v) for v in parts[1:5]]
assert cls_id >= 0, f"Negative class ID {cls_id} โ category_id in your JSON may start from 0"
assert all(0 <= v <= 1 for v in coords), f"Coordinates out of [0, 1] range: {coords}"
Tip
If you see negative class IDs, your COCO JSON likely uses category_id starting from 0. Add 1 to all category_id values in your JSON before running convert_coco(), since it maps class IDs as category_id - 1.
Troubleshooting Common Issues
Wrong Class IDs After Conversion
If your model trains but detects wrong object classes, you're likely using cls91to80=True (default) on a custom dataset. This maps your category_id values through the COCO 91-to-80 lookup table, which is only correct for the standard COCO dataset.
Solution: Always use cls91to80=False for custom datasets.
No Labels Found During Training
If training shows WARNING: No labels found or 0 images, N backgrounds, your label files are not in the expected directory. convert_coco() saves labels to a separate output directory (e.g., save_dir/labels/train/), but YOLO expects labels/ parallel to images/ inside your dataset directory.
Solution: Move label files to match the expected directory structure. Make sure labels/train/ is a sibling of images/train/.
KeyError During Conversion
If you get KeyError: 'bbox' or similar errors when running convert_coco(), your labels_dir likely contains non-instance JSON files (e.g., captions_train2017.json) that have a different annotation structure.
Solution: Only place instance annotation JSON files (e.g., instances_train2017.json) in the labels_dir.
Empty Label Files After Conversion
If conversion completes but .txt files are empty or missing, all annotations may have iscrowd: 1 (common with SAM-generated masks), or bounding boxes have zero width or height.
Solution: Inspect your JSON annotations for iscrowd values. If using SAM masks, preprocess the JSON to set iscrowd: 0.
Class ID Gaps in Converted Labels
If class IDs in label files are non-contiguous (e.g., 0, 4, 9 instead of 0, 1, 2), your annotation tool uses non-contiguous category_id values.
Solution: Verify the class IDs in your .txt files match the names dictionary in dataset.yaml. Remap IDs to contiguous values if needed.
For full API details and parameter descriptions, see the convert_coco API reference.
FAQ
How do I convert COCO JSON annotations to YOLO format?
Use the convert_coco() function from Ultralytics to convert COCO JSON annotations to YOLO .txt format. Set cls91to80=False for custom datasets:
from ultralytics.data.converter import convert_coco
convert_coco(labels_dir="path/to/annotations/", save_dir="output/", cls91to80=False)
After conversion, reorganize your label files so labels/ mirrors the images/ directory, then create a dataset.yaml file. See the step-by-step guide for the complete workflow.
Why does YOLO training show "No labels found" after COCO conversion?
This happens because convert_coco() saves labels to a subdirectory inside save_dir/labels/ (e.g., save_dir/labels/train/) rather than directly into your dataset's labels/train/ alongside images/train/. YOLO expects labels to sit parallel to images โ for example, images/train/img.jpg needs labels/train/img.txt. Move your converted labels to match this structure. See fixing the directory structure.
What does cls91to80 do in convert_coco()?
The cls91to80 parameter controls how COCO category_id values are mapped to YOLO class IDs. When True (default), it uses a lookup table designed for the standard COCO dataset, which has 80 classes with non-contiguous IDs (1-90). For custom datasets, always set cls91to80=False โ this simply subtracts 1 from each category_id to create zero-indexed class IDs.
Can I train YOLO directly on COCO JSON without converting?
Not with the current YOLO training pipeline โ annotations must be in YOLO .txt format with one file per image. Use convert_coco() to convert your COCO JSON first, then follow this guide to organize and train. For more on supported formats, see dataset formats.
Can I convert COCO segmentation annotations to YOLO format?
Yes, use use_segments=True when calling convert_coco() to include polygon segmentation masks in the converted YOLO labels. This produces label files compatible with YOLO segmentation models:
from ultralytics.data.converter import convert_coco
convert_coco(labels_dir="annotations/", save_dir="output/", use_segments=True, cls91to80=False)
How do I convert COCO keypoint annotations to YOLO format?
Use use_keypoints=True to convert COCO keypoint annotations for pose estimation training:
from ultralytics.data.converter import convert_coco
convert_coco(labels_dir="annotations/", save_dir="output/", use_keypoints=True, cls91to80=False)
Note that if both use_segments and use_keypoints are set to True, only keypoints will be written to the label files โ segments are silently ignored.