So trainst du YOLO direkt mit COCO JSON ohne Konvertierung

Warum direkt mit COCO JSON trainieren?

Annotationen im COCO JSON-Format können direkt für das Training mit Ultralytics YOLO verwendet werden, ohne sie vorher in .txt-Dateien umzuwandeln. Dies geschieht durch das Ableiten von YOLODataset, um COCO JSON im laufenden Betrieb zu parsen und über einen benutzerdefinierten Trainer in die Trainings-Pipeline einzubinden.

Dieser Ansatz hält das COCO JSON als alleinige Quelle der Wahrheit bereit – kein convert_coco()-Aufruf, kein Umorganisieren von Verzeichnissen, keine Zwischen-Label-Dateien. YOLO26 und alle anderen Ultralytics YOLO-Detektionsmodelle werden unterstützt. Segmentierungs- und Pose-Modelle erfordern zusätzliche Label-Felder (siehe FAQ).

Suchst du stattdessen eine einmalige Konvertierung?

Siehe den COCO zu YOLO Konvertierungsleitfaden für den standardmäßigen convert_coco()-Workflow.

Architektur-Übersicht

Es werden zwei Klassen benötigt:

COCODataset – liest COCO JSON und konvertiert Bounding Boxes während des Trainings im Arbeitsspeicher in das YOLO-Format.
COCOTrainer – überschreibt build_dataset(), um COCODataset anstelle des standardmäßigen YOLODataset zu verwenden.

Die Implementierung folgt demselben Muster wie das integrierte GroundingDataset, das JSON-Annotationen ebenfalls direkt liest. Drei Methoden werden überschrieben: get_img_files(), cache_labels() und get_labels().

Erstellen der COCO JSON Dataset-Klasse

Die COCODataset-Klasse erbt von YOLODataset und überschreibt die Logik zum Laden der Labels. Anstatt .txt-Dateien aus einem Label-Verzeichnis zu lesen, öffnet sie die COCO JSON-Datei, iteriert über die nach Bildern gruppierten Annotationen und konvertiert jede Bounding Box vom COCO-Pixelformat [x_min, y_min, width, height] in das normalisierte YOLO-Zentrumformat [x_center, y_center, width, height]. Crowd-Annotationen (iscrowd: 1) und Boxen mit einer Fläche von Null werden automatisch übersprungen.

The get_img_files() method returns an empty list because image paths are resolved from the JSON file_name field inside cache_labels(). Category IDs are sorted and remapped to zero-indexed class indices, so both 1-based (standard COCO) and non-contiguous ID schemes work correctly.

import json
from collections import defaultdict
from pathlib import Path

import numpy as np

from ultralytics.data.dataset import DATASET_CACHE_VERSION, YOLODataset
from ultralytics.data.utils import get_hash, load_dataset_cache_file, save_dataset_cache_file
from ultralytics.utils import TQDM

class COCODataset(YOLODataset):
    """Dataset that reads COCO JSON annotations directly without conversion to .txt files."""

    def __init__(self, *args, json_file="", **kwargs):
        self.json_file = json_file
        super().__init__(*args, data={"channels": 3}, **kwargs)

    def get_img_files(self, img_path):
        """Image paths are resolved from the JSON file, not from scanning a directory."""
        return []

    def cache_labels(self, path=Path("./labels.cache")):
        """Parse COCO JSON and convert annotations to YOLO format. Results are saved to a .cache file."""
        x = {"labels": []}
        with open(self.json_file) as f:
            coco = json.load(f)

        images = {img["id"]: img for img in coco["images"]}

        # Sort categories by ID and map to 0-indexed classes
        categories = {cat["id"]: i for i, cat in enumerate(sorted(coco["categories"], key=lambda c: c["id"]))}

        img_to_anns = defaultdict(list)
        for ann in coco["annotations"]:
            img_to_anns[ann["image_id"]].append(ann)

        for img_info in TQDM(coco["images"], desc="reading annotations"):
            h, w = img_info["height"], img_info["width"]
            im_file = Path(self.img_path) / img_info["file_name"]
            if not im_file.exists():
                continue

            self.im_files.append(str(im_file))
            bboxes = []
            for ann in img_to_anns.get(img_info["id"], []):
                if ann.get("iscrowd", False):
                    continue
                # COCO: [x, y, w, h] top-left in pixels -> YOLO: [cx, cy, w, h] center normalized
                box = np.array(ann["bbox"], dtype=np.float32)
                box[:2] += box[2:] / 2  # top-left to center
                box[[0, 2]] /= w  # normalize x
                box[[1, 3]] /= h  # normalize y
                if box[2] <= 0 or box[3] <= 0:
                    continue
                cls = categories[ann["category_id"]]
                bboxes.append([cls, *box.tolist()])

            lb = np.array(bboxes, dtype=np.float32) if bboxes else np.zeros((0, 5), dtype=np.float32)
            x["labels"].append(
                {
                    "im_file": str(im_file),
                    "shape": (h, w),
                    "cls": lb[:, 0:1],
                    "bboxes": lb[:, 1:],
                    "segments": [],
                    "normalized": True,
                    "bbox_format": "xywh",
                }
            )
        x["hash"] = get_hash([self.json_file, str(self.img_path)])
        save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION)
        return x

    def get_labels(self):
        """Load labels from .cache file if available, otherwise parse JSON and create the cache."""
        cache_path = Path(self.json_file).with_suffix(".cache")
        try:
            cache = load_dataset_cache_file(cache_path)
            assert cache["version"] == DATASET_CACHE_VERSION
            assert cache["hash"] == get_hash([self.json_file, str(self.img_path)])
            self.im_files = [lb["im_file"] for lb in cache["labels"]]
        except (FileNotFoundError, AssertionError, AttributeError, KeyError, ModuleNotFoundError):
            cache = self.cache_labels(cache_path)
        cache.pop("hash", None)
        cache.pop("version", None)
        return cache["labels"]

Geparste Labels werden in einer .cache-Datei neben dem JSON gespeichert (z. B. instances_train.cache). Bei nachfolgenden Trainingsläufen wird der Cache direkt geladen, wodurch das Parsen des JSON entfällt. Wenn sich die JSON-Datei ändert, schlägt die Hash-Prüfung fehl und der Cache wird automatisch neu erstellt.

Anbindung des Datasets an die Trainings-Pipeline

Die einzige notwendige Änderung im Trainer ist das Überschreiben von build_dataset(). Der standardmäßige DetectionTrainer erstellt ein YOLODataset, das nach .txt-Label-Dateien sucht. Indem man es durch COCODataset ersetzt, liest der Trainer stattdessen aus dem COCO JSON.

Der Pfad zur JSON-Datei wird aus einem benutzerdefinierten train_json / val_json-Feld in der Datenkonfiguration gezogen (siehe Schritt 3). Während des Trainings wird mode="train" zu train_json aufgelöst; während der Validierung wird mode="val" zu val_json aufgelöst. Wenn val_json nicht gesetzt ist, wird auf train_json zurückgegriffen.

from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import colorstr

class COCOTrainer(DetectionTrainer):
    """Trainer that uses COCODataset for direct COCO JSON training."""

    def build_dataset(self, img_path, mode="train", batch=None):
        json_file = self.data["train_json"] if mode == "train" else self.data.get("val_json", self.data["train_json"])
        return COCODataset(
            img_path=img_path,
            json_file=json_file,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=self.args.rect or mode == "val",
            cache=self.args.cache or None,
            single_cls=self.args.single_cls or False,
            stride=int(self.model.stride.max()) if hasattr(self, "model") and self.model else 32,
            pad=0.0 if mode == "train" else 0.5,
            prefix=colorstr(f"{mode}: "),
            task=self.args.task,
            classes=self.args.classes,
            fraction=self.args.fraction if mode == "train" else 1.0,
        )

Konfiguration von dataset.yaml für COCO JSON

Die dataset.yaml verwendet die Standardfelder path, train und val, um Bildverzeichnisse zu lokalisieren. Zwei zusätzliche Felder, train_json und val_json, geben die COCO-Annotationsdateien an, die COCOTrainer liest. Die Felder nc und names definieren die Anzahl der Klassen und deren Namen, passend zur sortierten Reihenfolge der categories im JSON.

path: /path/to/images # root directory with train/ and val/ subfolders
train: train
val: val

# COCO JSON annotation files
train_json: /path/to/annotations/instances_train.json
val_json: /path/to/annotations/instances_val.json

nc: 80
names:
    0: person
    1: bicycle
    # ... remaining class names

Erwartete Verzeichnisstruktur:

my_dataset/
  images/
    train/
      img_001.jpg
      ...
    val/
      img_100.jpg
      ...
  annotations/
    instances_train.json
    instances_val.json
  dataset.yaml

Training auf COCO JSON ausführen

Mit der Dataset-Klasse, der Trainer-Klasse und der YAML-Konfiguration an Ort und Stelle funktioniert das Training über den Standardaufruf model.train(). Der einzige Unterschied zu einem normalen Trainingslauf ist das Argument trainer=COCOTrainer, das Ultralytics anweist, den benutzerdefinierten Dataset-Lader anstelle des Standard-Laders zu verwenden.

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(data="dataset.yaml", epochs=100, imgsz=640, trainer=COCOTrainer)

Die vollständige Training-Pipeline läuft wie erwartet ab, einschließlich Validierung, Checkpoint-Speicherung und Metrik-Protokollierung.

Vollständige Implementierung

Zur Vereinfachung wird die vollständige Implementierung unten als ein einzelnes Skript zum Kopieren und Einfügen bereitgestellt. Es enthält das benutzerdefinierte Dataset, den benutzerdefinierten Trainer und den Trainingsaufruf. Speichere dies zusammen mit deiner dataset.yaml und führe es direkt aus.

import json
from collections import defaultdict
from pathlib import Path

import numpy as np

from ultralytics import YOLO
from ultralytics.data.dataset import DATASET_CACHE_VERSION, YOLODataset
from ultralytics.data.utils import get_hash, load_dataset_cache_file, save_dataset_cache_file
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import TQDM, colorstr

class COCODataset(YOLODataset):
    """Dataset that reads COCO JSON annotations directly without conversion to .txt files."""

    def __init__(self, *args, json_file="", **kwargs):
        self.json_file = json_file
        super().__init__(*args, data={"channels": 3}, **kwargs)

    def get_img_files(self, img_path):
        return []

    def cache_labels(self, path=Path("./labels.cache")):
        x = {"labels": []}
        with open(self.json_file) as f:
            coco = json.load(f)

        images = {img["id"]: img for img in coco["images"]}
        categories = {cat["id"]: i for i, cat in enumerate(sorted(coco["categories"], key=lambda c: c["id"]))}

        img_to_anns = defaultdict(list)
        for ann in coco["annotations"]:
            img_to_anns[ann["image_id"]].append(ann)

        for img_info in TQDM(coco["images"], desc="reading annotations"):
            h, w = img_info["height"], img_info["width"]
            im_file = Path(self.img_path) / img_info["file_name"]
            if not im_file.exists():
                continue

            self.im_files.append(str(im_file))
            bboxes = []
            for ann in img_to_anns.get(img_info["id"], []):
                if ann.get("iscrowd", False):
                    continue
                box = np.array(ann["bbox"], dtype=np.float32)
                box[:2] += box[2:] / 2
                box[[0, 2]] /= w
                box[[1, 3]] /= h
                if box[2] <= 0 or box[3] <= 0:
                    continue
                cls = categories[ann["category_id"]]
                bboxes.append([cls, *box.tolist()])

            lb = np.array(bboxes, dtype=np.float32) if bboxes else np.zeros((0, 5), dtype=np.float32)
            x["labels"].append(
                {
                    "im_file": str(im_file),
                    "shape": (h, w),
                    "cls": lb[:, 0:1],
                    "bboxes": lb[:, 1:],
                    "segments": [],
                    "normalized": True,
                    "bbox_format": "xywh",
                }
            )
        x["hash"] = get_hash([self.json_file, str(self.img_path)])
        save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION)
        return x

    def get_labels(self):
        cache_path = Path(self.json_file).with_suffix(".cache")
        try:
            cache = load_dataset_cache_file(cache_path)
            assert cache["version"] == DATASET_CACHE_VERSION
            assert cache["hash"] == get_hash([self.json_file, str(self.img_path)])
            self.im_files = [lb["im_file"] for lb in cache["labels"]]
        except (FileNotFoundError, AssertionError, AttributeError, KeyError, ModuleNotFoundError):
            cache = self.cache_labels(cache_path)
        cache.pop("hash", None)
        cache.pop("version", None)
        return cache["labels"]

class COCOTrainer(DetectionTrainer):
    """Trainer that uses COCODataset for direct COCO JSON training."""

    def build_dataset(self, img_path, mode="train", batch=None):
        json_file = self.data["train_json"] if mode == "train" else self.data.get("val_json", self.data["train_json"])
        return COCODataset(
            img_path=img_path,
            json_file=json_file,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=self.args.rect or mode == "val",
            cache=self.args.cache or None,
            single_cls=self.args.single_cls or False,
            stride=int(self.model.stride.max()) if hasattr(self, "model") and self.model else 32,
            pad=0.0 if mode == "train" else 0.5,
            prefix=colorstr(f"{mode}: "),
            task=self.args.task,
            classes=self.args.classes,
            fraction=self.args.fraction if mode == "train" else 1.0,
        )

model = YOLO("yolo26n.pt")
model.train(data="dataset.yaml", epochs=100, imgsz=640, trainer=COCOTrainer)

Empfehlungen zu Hyperparametern findest du im Leitfaden Tipps zum Modelltraining.

FAQ

Was ist der Unterschied zwischen diesem Ansatz und convert_coco()?

convert_coco() schreibt .txt-Label-Dateien als einmalige Konvertierung auf die Festplatte. Dieser Ansatz parst das JSON zu Beginn jedes Trainingslaufs und konvertiert Annotationen im Arbeitsspeicher. Verwende convert_coco(), wenn dauerhafte Labels im YOLO-Format bevorzugt werden; verwende diesen Ansatz, um das COCO JSON als einzige Quelle der Wahrheit zu behalten, ohne zusätzliche Dateien zu generieren.

Kann YOLO ohne benutzerdefinierten Code auf COCO JSON trainieren?

Nicht mit der aktuellen Ultralytics-Pipeline, die standardmäßig YOLO .txt-Labels erwartet. Dieser Leitfaden bietet den minimalen benutzerdefinierten Code, der benötigt wird – eine Dataset-Klasse und eine Trainer-Klasse. Einmal definiert, erfordert das Training nur einen Standard-Aufruf von model.train().

Unterstützt dies Segmentierung und Pose-Schätzung?

This guide covers object detection. To add instance segmentation support, include the segmentation polygon data from COCO annotations in the segments field of each label dictionary. For pose estimation, include keypoints. The GroundingDataset source code provides a reference implementation for handling segments.

Funktionieren Augmentierungen mit diesem benutzerdefinierten Dataset?

Ja. COCODataset erweitert YOLODataset, daher laufen alle integrierten Datenaugmentierungen – Mosaic, Mixup, Copy-Paste und andere – ohne Modifikationen.

Wie werden Kategorie-IDs auf Klassenindizes gemappt?

Kategorien werden nach id sortiert und auf fortlaufende Indizes ab 0 gemappt. Dies handhabt 1-basierte IDs (Standard-COCO), 0-basierte IDs und nicht zusammenhängende IDs. Das names-Dictionary in der dataset.yaml sollte der gleichen sortierten Reihenfolge wie das COCO-categories-Array folgen.

Gibt es einen Performance-Overhead im Vergleich zu vorkonvertierten Labels?

Das COCO JSON wird beim ersten Trainingslauf einmalig geparst. Geparste Labels werden in einer .cache-Datei gespeichert, sodass nachfolgende Läufe sofort ohne erneutes Parsen geladen werden. Die Trainingsgeschwindigkeit ist identisch mit dem Standard-YOLO-Training, da die Annotationen im Arbeitsspeicher gehalten werden. Der Cache wird automatisch neu erstellt, wenn sich die JSON-Datei ändert.

Contributors

RAraimbekovm² GLglenn-jocher¹

Created vor 2 MonatenUpdated vor 2 Monaten