Link to this sectionHow to Train YOLO on COCO JSON Without Converting#

Q: What is the difference between this and convert_coco()?

convert_coco() writes .txt label files to disk as a one-time conversion. This approach parses the JSON at the start of each training run and converts annotations in memory. Use convert_coco() when permanent YOLO-format labels are preferred; use this approach to keep the COCO JSON as the single source of truth without generating additional files.

Q: Do augmentations work with this custom dataset?

Yes. COCODataset extends YOLODataset, so all built-in data augmentations — mosaic, mixup, copy-paste, and others — run without modification.

Q: How are category IDs mapped to class indices?

Categories are sorted by id and mapped to sequential indices starting from 0. This handles 1-based IDs (standard COCO), 0-based IDs, and non-contiguous IDs. The names dictionary in dataset.yaml should follow the same sorted order as the COCO categories array.

Annotations in COCO JSON format can be used directly for Ultralytics YOLO training without converting to .txt files first. This works by subclassing YOLODataset to parse COCO JSON on the fly and wiring it into the training pipeline through a custom trainer.

Link to this sectionWhy Train Directly on COCO JSON#

This approach keeps the COCO JSON as the single source of truth — no convert_coco() call, no directory reorganization, no intermediate label files. YOLO26 and all other Ultralytics YOLO detection models are supported. Segmentation and pose models require additional label fields (see FAQ).

Looking for a one-time conversion instead?

See the COCO to YOLO Conversion guide for the standard convert_coco() workflow.

Link to this sectionArchitecture Overview#

Two classes are needed:

COCODataset — reads COCO JSON and converts bounding boxes to YOLO format in memory during training
COCOTrainer — overrides build_dataset() to use COCODataset instead of the default YOLODataset

The implementation follows the same pattern as the built-in GroundingDataset, which also reads JSON annotations directly. Three methods are overridden: get_img_files(), cache_labels(), and get_labels().

Link to this sectionBuilding the COCO JSON Dataset Class#

The COCODataset class inherits from YOLODataset and overrides the label loading logic. Instead of reading .txt files from a labels directory, it opens the COCO JSON file, iterates over annotations grouped by image, and converts each bounding box from COCO pixel format [x_min, y_min, width, height] to YOLO normalized center format [x_center, y_center, width, height]. Crowd annotations (iscrowd: 1) and zero-area boxes are skipped automatically.

The get_img_files() method returns an empty list because image paths are resolved from the JSON file_name field inside cache_labels(). Category IDs are sorted and remapped to zero-indexed class indices, so both 1-based (standard COCO) and non-contiguous ID schemes work correctly.

import json
from collections import defaultdict
from pathlib import Path

import numpy as np

from ultralytics.data.dataset import DATASET_CACHE_VERSION, YOLODataset
from ultralytics.data.utils import get_hash, load_dataset_cache_file, save_dataset_cache_file
from ultralytics.utils import TQDM

class COCODataset(YOLODataset):
    """Dataset that reads COCO JSON annotations directly without conversion to .txt files."""

    def __init__(self, *args, json_file="", **kwargs):
        """Initialize the dataset with a COCO JSON annotation file."""
        self.json_file = json_file
        super().__init__(*args, data={"channels": 3}, **kwargs)

    def get_img_files(self, img_path):
        """Image paths are resolved from the JSON file, not from scanning a directory."""
        return []

    def cache_labels(self, path=Path("./labels.cache")):
        """Parse COCO JSON and convert annotations to YOLO format. Results are saved to a .cache file."""
        x = {"labels": []}
        with open(self.json_file) as f:
            coco = json.load(f)

        # Sort categories by ID and map to 0-indexed classes
        categories = {cat["id"]: i for i, cat in enumerate(sorted(coco["categories"], key=lambda c: c["id"]))}

        img_to_anns = defaultdict(list)
        for ann in coco["annotations"]:
            img_to_anns[ann["image_id"]].append(ann)

        for img_info in TQDM(coco["images"], desc="reading annotations"):
            h, w = img_info["height"], img_info["width"]
            im_file = Path(self.img_path) / img_info["file_name"]
            if not im_file.exists():
                continue

            self.im_files.append(str(im_file))
            bboxes = []
            for ann in img_to_anns.get(img_info["id"], []):
                if ann.get("iscrowd", False):
                    continue
                # COCO: [x, y, w, h] top-left in pixels -> YOLO: [cx, cy, w, h] center normalized
                box = np.array(ann["bbox"], dtype=np.float32)
                box[:2] += box[2:] / 2  # top-left to center
                box[[0, 2]] /= w  # normalize x
                box[[1, 3]] /= h  # normalize y
                if box[2] <= 0 or box[3] <= 0:
                    continue
                cls = categories[ann["category_id"]]
                bboxes.append([cls, *box.tolist()])

            lb = np.array(bboxes, dtype=np.float32) if bboxes else np.zeros((0, 5), dtype=np.float32)
            x["labels"].append(
                {
                    "im_file": str(im_file),
                    "shape": (h, w),
                    "cls": lb[:, 0:1],
                    "bboxes": lb[:, 1:],
                    "segments": [],
                    "normalized": True,
                    "bbox_format": "xywh",
                }
            )
        x["hash"] = get_hash([self.json_file, str(self.img_path)])
        save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION)
        return x

    def get_labels(self):
        """Load labels from .cache file if available, otherwise parse JSON and create the cache."""
        cache_path = Path(self.json_file).with_suffix(".cache")
        try:
            cache = load_dataset_cache_file(cache_path)
            assert cache["version"] == DATASET_CACHE_VERSION
            assert cache["hash"] == get_hash([self.json_file, str(self.img_path)])
            self.im_files = [lb["im_file"] for lb in cache["labels"]]
        except (FileNotFoundError, AssertionError, AttributeError, KeyError, ModuleNotFoundError):
            cache = self.cache_labels(cache_path)
        cache.pop("hash", None)
        cache.pop("version", None)
        return cache["labels"]

Parsed labels are saved to a .cache file next to the JSON (e.g. instances_train.cache). On subsequent training runs, the cache is loaded directly, skipping JSON parsing. If the JSON file changes, the hash check fails and the cache is rebuilt automatically.

Link to this sectionConnecting the Dataset to the Training Pipeline#

The only change needed in the trainer is overriding build_dataset(). The default DetectionTrainer builds a YOLODataset that scans for .txt label files. By replacing it with COCODataset, the trainer reads from the COCO JSON instead.

The JSON file path is pulled from a custom train_json / val_json field in the data config (see Configuring dataset.yaml). During training, mode="train" resolves to train_json; during validation, mode="val" resolves to val_json. If val_json is not set, it falls back to train_json.

from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import colorstr

class COCOTrainer(DetectionTrainer):
    """Trainer that uses COCODataset for direct COCO JSON training."""

    def build_dataset(self, img_path, mode="train", batch=None):
        """Build a COCODataset for the given split using the JSON file from the data config."""
        json_file = self.data["train_json"] if mode == "train" else self.data.get("val_json", self.data["train_json"])
        return COCODataset(
            img_path=img_path,
            json_file=json_file,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=self.args.rect or mode == "val",
            cache=self.args.cache or None,
            single_cls=self.args.single_cls or False,
            stride=int(self.model.stride.max()) if hasattr(self, "model") and self.model else 32,
            pad=0.0 if mode == "train" else 0.5,
            prefix=colorstr(f"{mode}: "),
            task=self.args.task,
            classes=self.args.classes,
            fraction=self.args.fraction if mode == "train" else 1.0,
        )

Link to this sectionConfiguring dataset.yaml for COCO JSON#

The dataset.yaml uses the standard path, train, and val fields to locate image directories. Two additional fields, train_json and val_json, specify the COCO annotation files that COCOTrainer reads. The nc and names fields define the number of classes and their names, matching the sorted order of categories in the JSON.

path: /path/to/my_dataset/images # root with train/ and val/ image subfolders
train: train
val: val

# COCO JSON annotation files (use absolute paths; these custom keys are not resolved against `path`)
train_json: /path/to/my_dataset/annotations/instances_train.json
val_json: /path/to/my_dataset/annotations/instances_val.json

nc: 80
names:
    0: person
    1: bicycle
    # ... remaining class names

Expected directory structure:

my_dataset/
  images/
    train/
      img_001.jpg
      ...
    val/
      img_100.jpg
      ...
  annotations/
    instances_train.json
    instances_val.json
  dataset.yaml

Link to this sectionRunning Training on COCO JSON#

With the dataset class, trainer class, and YAML config in place, training works through the standard model.train() call. The only difference from a normal training run is the trainer=COCOTrainer argument, which tells Ultralytics to use the custom dataset loader instead of the default one.

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(data="dataset.yaml", epochs=100, imgsz=640, trainer=COCOTrainer)

The full training pipeline runs as expected, including validation, checkpoint saving, and metric logging.

Link to this sectionFull Implementation#

For convenience, the full implementation is provided below as a single copy-paste script. It includes the custom dataset, custom trainer, and the training call. Save this alongside your dataset.yaml and run it directly.

import json
from collections import defaultdict
from pathlib import Path

import numpy as np

from ultralytics import YOLO
from ultralytics.data.dataset import DATASET_CACHE_VERSION, YOLODataset
from ultralytics.data.utils import get_hash, load_dataset_cache_file, save_dataset_cache_file
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import TQDM, colorstr

class COCODataset(YOLODataset):
    """Dataset that reads COCO JSON annotations directly without conversion to .txt files."""

    def __init__(self, *args, json_file="", **kwargs):
        """Initialize the dataset with a COCO JSON annotation file."""
        self.json_file = json_file
        super().__init__(*args, data={"channels": 3}, **kwargs)

    def get_img_files(self, img_path):
        """Image paths are resolved from the JSON file, not from scanning a directory."""
        return []

    def cache_labels(self, path=Path("./labels.cache")):
        """Parse COCO JSON and convert annotations to YOLO format. Results are saved to a .cache file."""
        x = {"labels": []}
        with open(self.json_file) as f:
            coco = json.load(f)

        categories = {cat["id"]: i for i, cat in enumerate(sorted(coco["categories"], key=lambda c: c["id"]))}

        img_to_anns = defaultdict(list)
        for ann in coco["annotations"]:
            img_to_anns[ann["image_id"]].append(ann)

        for img_info in TQDM(coco["images"], desc="reading annotations"):
            h, w = img_info["height"], img_info["width"]
            im_file = Path(self.img_path) / img_info["file_name"]
            if not im_file.exists():
                continue

            self.im_files.append(str(im_file))
            bboxes = []
            for ann in img_to_anns.get(img_info["id"], []):
                if ann.get("iscrowd", False):
                    continue
                box = np.array(ann["bbox"], dtype=np.float32)
                box[:2] += box[2:] / 2
                box[[0, 2]] /= w
                box[[1, 3]] /= h
                if box[2] <= 0 or box[3] <= 0:
                    continue
                cls = categories[ann["category_id"]]
                bboxes.append([cls, *box.tolist()])

            lb = np.array(bboxes, dtype=np.float32) if bboxes else np.zeros((0, 5), dtype=np.float32)
            x["labels"].append(
                {
                    "im_file": str(im_file),
                    "shape": (h, w),
                    "cls": lb[:, 0:1],
                    "bboxes": lb[:, 1:],
                    "segments": [],
                    "normalized": True,
                    "bbox_format": "xywh",
                }
            )
        x["hash"] = get_hash([self.json_file, str(self.img_path)])
        save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION)
        return x

    def get_labels(self):
        """Load labels from .cache file if available, otherwise parse JSON and create the cache."""
        cache_path = Path(self.json_file).with_suffix(".cache")
        try:
            cache = load_dataset_cache_file(cache_path)
            assert cache["version"] == DATASET_CACHE_VERSION
            assert cache["hash"] == get_hash([self.json_file, str(self.img_path)])
            self.im_files = [lb["im_file"] for lb in cache["labels"]]
        except (FileNotFoundError, AssertionError, AttributeError, KeyError, ModuleNotFoundError):
            cache = self.cache_labels(cache_path)
        cache.pop("hash", None)
        cache.pop("version", None)
        return cache["labels"]

class COCOTrainer(DetectionTrainer):
    """Trainer that uses COCODataset for direct COCO JSON training."""

    def build_dataset(self, img_path, mode="train", batch=None):
        """Build a COCODataset for the given split using the JSON file from the data config."""
        json_file = self.data["train_json"] if mode == "train" else self.data.get("val_json", self.data["train_json"])
        return COCODataset(
            img_path=img_path,
            json_file=json_file,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=self.args.rect or mode == "val",
            cache=self.args.cache or None,
            single_cls=self.args.single_cls or False,
            stride=int(self.model.stride.max()) if hasattr(self, "model") and self.model else 32,
            pad=0.0 if mode == "train" else 0.5,
            prefix=colorstr(f"{mode}: "),
            task=self.args.task,
            classes=self.args.classes,
            fraction=self.args.fraction if mode == "train" else 1.0,
        )

model = YOLO("yolo26n.pt")
model.train(data="dataset.yaml", epochs=100, imgsz=640, trainer=COCOTrainer)

You now have a minimal dataset and trainer that train Ultralytics YOLO directly on COCO JSON, with annotations staying the single source of truth and no intermediate .txt files. Extend the cache_labels() method with segments or keypoints to cover segmentation and pose, and see the Model Training Tips guide for hyperparameter tuning recommendations.

Link to this sectionFAQ#

Link to this sectionWhat is the difference between this and convert_coco()?#

convert_coco() writes .txt label files to disk as a one-time conversion. This approach parses the JSON at the start of each training run and converts annotations in memory. Use convert_coco() when permanent YOLO-format labels are preferred; use this approach to keep the COCO JSON as the single source of truth without generating additional files.

Link to this sectionCan YOLO train on COCO JSON without custom code?#

Not with the current Ultralytics pipeline, which expects YOLO .txt labels by default. This guide provides the minimal custom code needed — one dataset class and one trainer class. Once defined, training requires only a standard model.train() call.

Link to this sectionDoes this support segmentation and pose estimation?#

This guide covers object detection. To add instance segmentation support, include the segmentation polygon data from COCO annotations in the segments field of each label dictionary. For pose estimation, include keypoints. The GroundingDataset source code provides a reference implementation for handling segments.

Link to this sectionDo augmentations work with this custom dataset?#

Yes. COCODataset extends YOLODataset, so all built-in data augmentations — mosaic, mixup, copy-paste, and others — run without modification.

Link to this sectionHow are category IDs mapped to class indices?#

Categories are sorted by id and mapped to sequential indices starting from 0. This handles 1-based IDs (standard COCO), 0-based IDs, and non-contiguous IDs. The names dictionary in dataset.yaml should follow the same sorted order as the COCO categories array.

Link to this sectionIs there a performance overhead compared to pre-converted labels?#

The COCO JSON is parsed once on the first training run. Parsed labels are saved to a .cache file, so subsequent runs load instantly without re-parsing. Training speed is identical to standard YOLO training since annotations are held in memory. The cache is rebuilt automatically if the JSON file changes.

Contributors

RAraimbekovm³ GLglenn-jocher³

Created 3 months agoUpdated 1 week ago