COCO JSONを変換せずにYOLOをトレーニングする方法

COCO JSONで直接トレーニングする理由

Annotations in COCO JSON format can be used directly for Ultralytics YOLO training without converting to .txt files first. This is done by subclassing YOLODataset to parse COCO JSON on the fly and wiring it into the training pipeline through a custom trainer.

このアプローチでは、COCO JSONが唯一の信頼できる情報源（Single Source of Truth）として維持されます。つまり、convert_coco()の呼び出しやディレクトリの再構成、中間ラベルファイルは不要です。YOLO26および他のすべてのUltralytics YOLO検出モデルがサポートされています。セグメンテーションモデルおよびポーズモデルには追加のラベルフィールドが必要です（FAQを参照してください）。

一度限りの変換をお探しですか？

See the COCO to YOLO Conversion guide for the standard convert_coco() workflow.

アーキテクチャの概要

2つのクラスが必要です：

COCODataset — COCO JSONを読み取り、トレーニング中にバウンディングボックスをメモリ上でYOLO形式に変換します
COCOTrainer — overrides build_dataset() to use COCODataset instead of the default YOLODataset

実装は、同じくJSONアノテーションを直接読み取る組み込みのGroundingDatasetと同じパターンに従います。get_img_files()、cache_labels()、get_labels()の3つのメソッドがオーバーライドされます。

COCO JSONデータセットクラスの構築

COCODatasetクラスはYOLODatasetを継承し、ラベル読み込みロジックをオーバーライドします。ラベルディレクトリから.txtファイルを読み込む代わりに、COCO JSONファイルを開き、画像ごとにグループ化されたアノテーションを反復処理し、各バウンディングボックスをCOCOピクセル形式[x_min, y_min, width, height]からYOLO正規化中心形式[x_center, y_center, width, height]に変換します。群衆アノテーション（iscrowd: 1）および面積がゼロのボックスは自動的にスキップされます。

The get_img_files() method returns an empty list because image paths are resolved from the JSON file_name field inside cache_labels(). Category IDs are sorted and remapped to zero-indexed class indices, so both 1-based (standard COCO) and non-contiguous ID schemes work correctly.

import json
from collections import defaultdict
from pathlib import Path

import numpy as np

from ultralytics.data.dataset import DATASET_CACHE_VERSION, YOLODataset
from ultralytics.data.utils import get_hash, load_dataset_cache_file, save_dataset_cache_file
from ultralytics.utils import TQDM

class COCODataset(YOLODataset):
    """Dataset that reads COCO JSON annotations directly without conversion to .txt files."""

    def __init__(self, *args, json_file="", **kwargs):
        self.json_file = json_file
        super().__init__(*args, data={"channels": 3}, **kwargs)

    def get_img_files(self, img_path):
        """Image paths are resolved from the JSON file, not from scanning a directory."""
        return []

    def cache_labels(self, path=Path("./labels.cache")):
        """Parse COCO JSON and convert annotations to YOLO format. Results are saved to a .cache file."""
        x = {"labels": []}
        with open(self.json_file) as f:
            coco = json.load(f)

        images = {img["id"]: img for img in coco["images"]}

        # Sort categories by ID and map to 0-indexed classes
        categories = {cat["id"]: i for i, cat in enumerate(sorted(coco["categories"], key=lambda c: c["id"]))}

        img_to_anns = defaultdict(list)
        for ann in coco["annotations"]:
            img_to_anns[ann["image_id"]].append(ann)

        for img_info in TQDM(coco["images"], desc="reading annotations"):
            h, w = img_info["height"], img_info["width"]
            im_file = Path(self.img_path) / img_info["file_name"]
            if not im_file.exists():
                continue

            self.im_files.append(str(im_file))
            bboxes = []
            for ann in img_to_anns.get(img_info["id"], []):
                if ann.get("iscrowd", False):
                    continue
                # COCO: [x, y, w, h] top-left in pixels -> YOLO: [cx, cy, w, h] center normalized
                box = np.array(ann["bbox"], dtype=np.float32)
                box[:2] += box[2:] / 2  # top-left to center
                box[[0, 2]] /= w  # normalize x
                box[[1, 3]] /= h  # normalize y
                if box[2] <= 0 or box[3] <= 0:
                    continue
                cls = categories[ann["category_id"]]
                bboxes.append([cls, *box.tolist()])

            lb = np.array(bboxes, dtype=np.float32) if bboxes else np.zeros((0, 5), dtype=np.float32)
            x["labels"].append(
                {
                    "im_file": str(im_file),
                    "shape": (h, w),
                    "cls": lb[:, 0:1],
                    "bboxes": lb[:, 1:],
                    "segments": [],
                    "normalized": True,
                    "bbox_format": "xywh",
                }
            )
        x["hash"] = get_hash([self.json_file, str(self.img_path)])
        save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION)
        return x

    def get_labels(self):
        """Load labels from .cache file if available, otherwise parse JSON and create the cache."""
        cache_path = Path(self.json_file).with_suffix(".cache")
        try:
            cache = load_dataset_cache_file(cache_path)
            assert cache["version"] == DATASET_CACHE_VERSION
            assert cache["hash"] == get_hash([self.json_file, str(self.img_path)])
            self.im_files = [lb["im_file"] for lb in cache["labels"]]
        except (FileNotFoundError, AssertionError, AttributeError, KeyError, ModuleNotFoundError):
            cache = self.cache_labels(cache_path)
        cache.pop("hash", None)
        cache.pop("version", None)
        return cache["labels"]

解析されたラベルは、JSONの隣に.cacheファイルとして保存されます（例: instances_train.cache）。後続のトレーニング実行時にはキャッシュが直接読み込まれるため、JSONの解析はスキップされます。JSONファイルが変更された場合、ハッシュチェックが失敗し、キャッシュは自動的に再構築されます。

データセットをトレーニングパイプラインに接続する

The only change needed in the trainer is overriding build_dataset(). The default DetectionTrainer builds a YOLODataset that scans for .txt label files. By replacing it with COCODataset, the trainer reads from the COCO JSON instead.

JSONファイルのパスは、データ設定内のカスタムtrain_json / val_jsonフィールドから取得されます（ステップ3を参照）。トレーニング中、mode="train"はtrain_jsonに解決され、検証中、mode="val"はval_jsonに解決されます。val_jsonが設定されていない場合は、train_jsonがフォールバックとして使用されます。

from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import colorstr

class COCOTrainer(DetectionTrainer):
    """Trainer that uses COCODataset for direct COCO JSON training."""

    def build_dataset(self, img_path, mode="train", batch=None):
        json_file = self.data["train_json"] if mode == "train" else self.data.get("val_json", self.data["train_json"])
        return COCODataset(
            img_path=img_path,
            json_file=json_file,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=self.args.rect or mode == "val",
            cache=self.args.cache or None,
            single_cls=self.args.single_cls or False,
            stride=int(self.model.stride.max()) if hasattr(self, "model") and self.model else 32,
            pad=0.0 if mode == "train" else 0.5,
            prefix=colorstr(f"{mode}: "),
            task=self.args.task,
            classes=self.args.classes,
            fraction=self.args.fraction if mode == "train" else 1.0,
        )

COCO JSON用にdataset.yamlを設定する

dataset.yamlは、標準のpath、train、valフィールドを使用して画像ディレクトリを特定します。追加の2つのフィールドであるtrain_jsonとval_jsonは、COCOTrainerが読み取るCOCOアノテーションファイルを指定します。ncおよびnamesフィールドは、クラス数とその名前を定義し、JSON内のcategoriesのソート順序と一致させる必要があります。

path: /path/to/images # root directory with train/ and val/ subfolders
train: train
val: val

# COCO JSON annotation files
train_json: /path/to/annotations/instances_train.json
val_json: /path/to/annotations/instances_val.json

nc: 80
names:
    0: person
    1: bicycle
    # ... remaining class names

期待されるディレクトリ構造：

my_dataset/
  images/
    train/
      img_001.jpg
      ...
    val/
      img_100.jpg
      ...
  annotations/
    instances_train.json
    instances_val.json
  dataset.yaml

COCO JSONでトレーニングを実行する

データセットクラス、トレーナークラス、YAML設定が準備できたら、トレーニングは標準のmodel.train()呼び出しを通じて機能します。通常のトレーニング実行との唯一の違いは、trainer=COCOTrainer引数です。これにより、Ultralyticsはデフォルトのローダーの代わりにカスタムデータセットローダーを使用するように指示されます。

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(data="dataset.yaml", epochs=100, imgsz=640, trainer=COCOTrainer)

完全なトレーニングパイプラインが、検証、チェックポイントの保存、メトリックのログ記録を含めて期待通りに実行されます。

完全な実装

便宜上、完全な実装を単一のコピー＆ペースト可能なスクリプトとして以下に提供します。これにはカスタムデータセット、カスタムトレーナー、およびトレーニング呼び出しが含まれます。これをdataset.yamlと一緒に保存して直接実行してください。

import json
from collections import defaultdict
from pathlib import Path

import numpy as np

from ultralytics import YOLO
from ultralytics.data.dataset import DATASET_CACHE_VERSION, YOLODataset
from ultralytics.data.utils import get_hash, load_dataset_cache_file, save_dataset_cache_file
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import TQDM, colorstr

class COCODataset(YOLODataset):
    """Dataset that reads COCO JSON annotations directly without conversion to .txt files."""

    def __init__(self, *args, json_file="", **kwargs):
        self.json_file = json_file
        super().__init__(*args, data={"channels": 3}, **kwargs)

    def get_img_files(self, img_path):
        return []

    def cache_labels(self, path=Path("./labels.cache")):
        x = {"labels": []}
        with open(self.json_file) as f:
            coco = json.load(f)

        images = {img["id"]: img for img in coco["images"]}
        categories = {cat["id"]: i for i, cat in enumerate(sorted(coco["categories"], key=lambda c: c["id"]))}

        img_to_anns = defaultdict(list)
        for ann in coco["annotations"]:
            img_to_anns[ann["image_id"]].append(ann)

        for img_info in TQDM(coco["images"], desc="reading annotations"):
            h, w = img_info["height"], img_info["width"]
            im_file = Path(self.img_path) / img_info["file_name"]
            if not im_file.exists():
                continue

            self.im_files.append(str(im_file))
            bboxes = []
            for ann in img_to_anns.get(img_info["id"], []):
                if ann.get("iscrowd", False):
                    continue
                box = np.array(ann["bbox"], dtype=np.float32)
                box[:2] += box[2:] / 2
                box[[0, 2]] /= w
                box[[1, 3]] /= h
                if box[2] <= 0 or box[3] <= 0:
                    continue
                cls = categories[ann["category_id"]]
                bboxes.append([cls, *box.tolist()])

            lb = np.array(bboxes, dtype=np.float32) if bboxes else np.zeros((0, 5), dtype=np.float32)
            x["labels"].append(
                {
                    "im_file": str(im_file),
                    "shape": (h, w),
                    "cls": lb[:, 0:1],
                    "bboxes": lb[:, 1:],
                    "segments": [],
                    "normalized": True,
                    "bbox_format": "xywh",
                }
            )
        x["hash"] = get_hash([self.json_file, str(self.img_path)])
        save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION)
        return x

    def get_labels(self):
        cache_path = Path(self.json_file).with_suffix(".cache")
        try:
            cache = load_dataset_cache_file(cache_path)
            assert cache["version"] == DATASET_CACHE_VERSION
            assert cache["hash"] == get_hash([self.json_file, str(self.img_path)])
            self.im_files = [lb["im_file"] for lb in cache["labels"]]
        except (FileNotFoundError, AssertionError, AttributeError, KeyError, ModuleNotFoundError):
            cache = self.cache_labels(cache_path)
        cache.pop("hash", None)
        cache.pop("version", None)
        return cache["labels"]

class COCOTrainer(DetectionTrainer):
    """Trainer that uses COCODataset for direct COCO JSON training."""

    def build_dataset(self, img_path, mode="train", batch=None):
        json_file = self.data["train_json"] if mode == "train" else self.data.get("val_json", self.data["train_json"])
        return COCODataset(
            img_path=img_path,
            json_file=json_file,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=self.args.rect or mode == "val",
            cache=self.args.cache or None,
            single_cls=self.args.single_cls or False,
            stride=int(self.model.stride.max()) if hasattr(self, "model") and self.model else 32,
            pad=0.0 if mode == "train" else 0.5,
            prefix=colorstr(f"{mode}: "),
            task=self.args.task,
            classes=self.args.classes,
            fraction=self.args.fraction if mode == "train" else 1.0,
        )

model = YOLO("yolo26n.pt")
model.train(data="dataset.yaml", epochs=100, imgsz=640, trainer=COCOTrainer)

ハイパーパラメータの推奨事項については、モデルトレーニングのヒントガイドを参照してください。

FAQ

これとconvert_coco()の違いは何ですか？

convert_coco()は、一度限りの変換として.txtラベルファイルをディスクに書き込みます。このアプローチでは、トレーニング実行の開始時にJSONを解析し、メモリ内でアノテーションを変換します。YOLO形式のラベルを永続的に保持したい場合はconvert_coco()を使用し、追加ファイルを生成せずにCOCO JSONを唯一の信頼できる情報源として維持したい場合はこのアプローチを使用してください。

カスタムコードなしでYOLOをCOCO JSONでトレーニングできますか？

現在のUltralyticsパイプラインではできません。パイプラインはデフォルトでYOLOの.txtラベルを期待しています。このガイドでは、必要な最小限のカスタムコード（1つのデータセットクラスと1つのトレーナークラス）を提供します。一度定義すれば、トレーニングには標準のmodel.train()呼び出しのみが必要です。

これはセグメンテーションとポーズ推定をサポートしていますか？

This guide covers object detection. To add instance segmentation support, include the segmentation polygon data from COCO annotations in the segments field of each label dictionary. For pose estimation, include keypoints. The GroundingDataset source code provides a reference implementation for handling segments.

このカスタムデータセットでデータ拡張は機能しますか？

Yes. COCODataset extends YOLODataset, so all built-in data augmentations — mosaic, mixup, copy-paste, and others — run without modification.

カテゴリIDはどのようにクラスインデックスにマッピングされますか？

Categories are sorted by id and mapped to sequential indices starting from 0. This handles 1-based IDs (standard COCO), 0-based IDs, and non-contiguous IDs. The names dictionary in dataset.yaml should follow the same sorted order as the COCO categories array.

事前変換されたラベルと比較してパフォーマンスのオーバーヘッドはありますか？

COCO JSONは最初のトレーニング実行時に一度だけ解析されます。解析されたラベルは.cacheファイルに保存されるため、後続の実行では再解析なしで即座に読み込まれます。アノテーションはメモリ内に保持されるため、トレーニング速度は標準のYOLOトレーニングと同じです。JSONファイルが変更された場合は、キャッシュが自動的に再構築されます。

Contributors

RAraimbekovm² GLglenn-jocher¹

Created 2 か月前Updated 2 か月前