변환 없이 COCO JSON으로 YOLO를 학습시키는 방법

COCO JSON으로 직접 학습해야 하는 이유

Annotations in COCO JSON format can be used directly for Ultralytics YOLO training without converting to .txt files first. This is done by subclassing YOLODataset to parse COCO JSON on the fly and wiring it into the training pipeline through a custom trainer.

이 접근 방식은 COCO JSON을 단일 정보 소스로 유지하므로 convert_coco() 호출, 디렉터리 재구성, 중간 레이블 파일 생성이 필요하지 않습니다. YOLO26 및 다른 모든 Ultralytics YOLO 탐지 모델이 지원됩니다. 세그멘테이션 및 포즈 모델은 추가 레이블 필드가 필요합니다(FAQ 참조).

일회성 변환을 찾고 계신가요?

See the COCO to YOLO Conversion guide for the standard convert_coco() workflow.

아키텍처 개요

다음 두 가지 클래스가 필요합니다:

COCODataset — 학습 중에 COCO JSON을 읽고 경계 상자(bounding boxes)를 메모리상에서 YOLO 형식으로 변환합니다.
COCOTrainer — overrides build_dataset() to use COCODataset instead of the default YOLODataset

구현은 JSON 주석을 직접 읽는 내장 GroundingDataset과 동일한 패턴을 따릅니다. get_img_files(), cache_labels(), get_labels() 등 세 가지 메서드가 재정의됩니다.

COCO JSON 데이터셋 클래스 빌드하기

COCODataset 클래스는 YOLODataset을 상속받아 레이블 로드 로직을 재정의합니다. 레이블 디렉터리에서 .txt 파일을 읽는 대신, COCO JSON 파일을 열고 이미지별로 그룹화된 주석을 반복하며 각 경계 상자를 COCO 픽셀 형식 [x_min, y_min, width, height]에서 YOLO 정규화 중심 형식 [x_center, y_center, width, height]로 변환합니다. 군중 주석(iscrowd: 1)과 면적이 0인 상자는 자동으로 건너뜁니다.

The get_img_files() method returns an empty list because image paths are resolved from the JSON file_name field inside cache_labels(). Category IDs are sorted and remapped to zero-indexed class indices, so both 1-based (standard COCO) and non-contiguous ID schemes work correctly.

import json
from collections import defaultdict
from pathlib import Path

import numpy as np

from ultralytics.data.dataset import DATASET_CACHE_VERSION, YOLODataset
from ultralytics.data.utils import get_hash, load_dataset_cache_file, save_dataset_cache_file
from ultralytics.utils import TQDM

class COCODataset(YOLODataset):
    """Dataset that reads COCO JSON annotations directly without conversion to .txt files."""

    def __init__(self, *args, json_file="", **kwargs):
        self.json_file = json_file
        super().__init__(*args, data={"channels": 3}, **kwargs)

    def get_img_files(self, img_path):
        """Image paths are resolved from the JSON file, not from scanning a directory."""
        return []

    def cache_labels(self, path=Path("./labels.cache")):
        """Parse COCO JSON and convert annotations to YOLO format. Results are saved to a .cache file."""
        x = {"labels": []}
        with open(self.json_file) as f:
            coco = json.load(f)

        images = {img["id"]: img for img in coco["images"]}

        # Sort categories by ID and map to 0-indexed classes
        categories = {cat["id"]: i for i, cat in enumerate(sorted(coco["categories"], key=lambda c: c["id"]))}

        img_to_anns = defaultdict(list)
        for ann in coco["annotations"]:
            img_to_anns[ann["image_id"]].append(ann)

        for img_info in TQDM(coco["images"], desc="reading annotations"):
            h, w = img_info["height"], img_info["width"]
            im_file = Path(self.img_path) / img_info["file_name"]
            if not im_file.exists():
                continue

            self.im_files.append(str(im_file))
            bboxes = []
            for ann in img_to_anns.get(img_info["id"], []):
                if ann.get("iscrowd", False):
                    continue
                # COCO: [x, y, w, h] top-left in pixels -> YOLO: [cx, cy, w, h] center normalized
                box = np.array(ann["bbox"], dtype=np.float32)
                box[:2] += box[2:] / 2  # top-left to center
                box[[0, 2]] /= w  # normalize x
                box[[1, 3]] /= h  # normalize y
                if box[2] <= 0 or box[3] <= 0:
                    continue
                cls = categories[ann["category_id"]]
                bboxes.append([cls, *box.tolist()])

            lb = np.array(bboxes, dtype=np.float32) if bboxes else np.zeros((0, 5), dtype=np.float32)
            x["labels"].append(
                {
                    "im_file": str(im_file),
                    "shape": (h, w),
                    "cls": lb[:, 0:1],
                    "bboxes": lb[:, 1:],
                    "segments": [],
                    "normalized": True,
                    "bbox_format": "xywh",
                }
            )
        x["hash"] = get_hash([self.json_file, str(self.img_path)])
        save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION)
        return x

    def get_labels(self):
        """Load labels from .cache file if available, otherwise parse JSON and create the cache."""
        cache_path = Path(self.json_file).with_suffix(".cache")
        try:
            cache = load_dataset_cache_file(cache_path)
            assert cache["version"] == DATASET_CACHE_VERSION
            assert cache["hash"] == get_hash([self.json_file, str(self.img_path)])
            self.im_files = [lb["im_file"] for lb in cache["labels"]]
        except (FileNotFoundError, AssertionError, AttributeError, KeyError, ModuleNotFoundError):
            cache = self.cache_labels(cache_path)
        cache.pop("hash", None)
        cache.pop("version", None)
        return cache["labels"]

파싱된 레이블은 JSON 옆의 .cache 파일(예: instances_train.cache)에 저장됩니다. 이후 학습 실행 시 캐시가 바로 로드되므로 JSON 파싱 과정을 건너뜁니다. JSON 파일이 변경되면 해시 검사가 실패하고 캐시가 자동으로 다시 생성됩니다.

데이터셋을 학습 파이프라인에 연결하기

The only change needed in the trainer is overriding build_dataset(). The default DetectionTrainer builds a YOLODataset that scans for .txt label files. By replacing it with COCODataset, the trainer reads from the COCO JSON instead.

JSON 파일 경로는 데이터 구성의 커스텀 train_json / val_json 필드에서 가져옵니다(3단계 참조). 학습 중에는 mode="train"이 train_json으로, 검증 중에는 mode="val"이 val_json으로 결정됩니다. val_json이 설정되지 않은 경우 train_json으로 대체됩니다.

from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import colorstr

class COCOTrainer(DetectionTrainer):
    """Trainer that uses COCODataset for direct COCO JSON training."""

    def build_dataset(self, img_path, mode="train", batch=None):
        json_file = self.data["train_json"] if mode == "train" else self.data.get("val_json", self.data["train_json"])
        return COCODataset(
            img_path=img_path,
            json_file=json_file,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=self.args.rect or mode == "val",
            cache=self.args.cache or None,
            single_cls=self.args.single_cls or False,
            stride=int(self.model.stride.max()) if hasattr(self, "model") and self.model else 32,
            pad=0.0 if mode == "train" else 0.5,
            prefix=colorstr(f"{mode}: "),
            task=self.args.task,
            classes=self.args.classes,
            fraction=self.args.fraction if mode == "train" else 1.0,
        )

COCO JSON을 위한 dataset.yaml 구성

dataset.yaml은 표준 path, train, val 필드를 사용하여 이미지 디렉터리를 찾습니다. train_json과 val_json이라는 두 개의 추가 필드는 COCOTrainer가 읽을 COCO 주석 파일을 지정합니다. nc와 names 필드는 JSON 내 categories의 정렬 순서와 일치하는 클래스 수와 이름을 정의합니다.

path: /path/to/images # root directory with train/ and val/ subfolders
train: train
val: val

# COCO JSON annotation files
train_json: /path/to/annotations/instances_train.json
val_json: /path/to/annotations/instances_val.json

nc: 80
names:
    0: person
    1: bicycle
    # ... remaining class names

예상되는 디렉터리 구조:

my_dataset/
  images/
    train/
      img_001.jpg
      ...
    val/
      img_100.jpg
      ...
  annotations/
    instances_train.json
    instances_val.json
  dataset.yaml

COCO JSON에서 학습 실행하기

데이터셋 클래스, 트레이너 클래스, YAML 구성이 완료되면 표준 model.train() 호출을 통해 학습이 진행됩니다. 일반 학습 실행과의 유일한 차이점은 trainer=COCOTrainer 인자입니다. 이는 Ultralytics에게 기본 데이터셋 로더 대신 커스텀 데이터셋 로더를 사용하도록 지시합니다.

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(data="dataset.yaml", epochs=100, imgsz=640, trainer=COCOTrainer)

The full training pipeline runs as expected, including validation, checkpoint saving, and metric logging.

전체 구현

편의를 위해 전체 구현을 단일 복사-붙여넣기 스크립트로 아래에 제공합니다. 여기에는 커스텀 데이터셋, 커스텀 트레이너, 학습 호출이 포함되어 있습니다. 이 내용을 dataset.yaml과 함께 저장하고 직접 실행하십시오.

import json
from collections import defaultdict
from pathlib import Path

import numpy as np

from ultralytics import YOLO
from ultralytics.data.dataset import DATASET_CACHE_VERSION, YOLODataset
from ultralytics.data.utils import get_hash, load_dataset_cache_file, save_dataset_cache_file
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import TQDM, colorstr

class COCODataset(YOLODataset):
    """Dataset that reads COCO JSON annotations directly without conversion to .txt files."""

    def __init__(self, *args, json_file="", **kwargs):
        self.json_file = json_file
        super().__init__(*args, data={"channels": 3}, **kwargs)

    def get_img_files(self, img_path):
        return []

    def cache_labels(self, path=Path("./labels.cache")):
        x = {"labels": []}
        with open(self.json_file) as f:
            coco = json.load(f)

        images = {img["id"]: img for img in coco["images"]}
        categories = {cat["id"]: i for i, cat in enumerate(sorted(coco["categories"], key=lambda c: c["id"]))}

        img_to_anns = defaultdict(list)
        for ann in coco["annotations"]:
            img_to_anns[ann["image_id"]].append(ann)

        for img_info in TQDM(coco["images"], desc="reading annotations"):
            h, w = img_info["height"], img_info["width"]
            im_file = Path(self.img_path) / img_info["file_name"]
            if not im_file.exists():
                continue

            self.im_files.append(str(im_file))
            bboxes = []
            for ann in img_to_anns.get(img_info["id"], []):
                if ann.get("iscrowd", False):
                    continue
                box = np.array(ann["bbox"], dtype=np.float32)
                box[:2] += box[2:] / 2
                box[[0, 2]] /= w
                box[[1, 3]] /= h
                if box[2] <= 0 or box[3] <= 0:
                    continue
                cls = categories[ann["category_id"]]
                bboxes.append([cls, *box.tolist()])

            lb = np.array(bboxes, dtype=np.float32) if bboxes else np.zeros((0, 5), dtype=np.float32)
            x["labels"].append(
                {
                    "im_file": str(im_file),
                    "shape": (h, w),
                    "cls": lb[:, 0:1],
                    "bboxes": lb[:, 1:],
                    "segments": [],
                    "normalized": True,
                    "bbox_format": "xywh",
                }
            )
        x["hash"] = get_hash([self.json_file, str(self.img_path)])
        save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION)
        return x

    def get_labels(self):
        cache_path = Path(self.json_file).with_suffix(".cache")
        try:
            cache = load_dataset_cache_file(cache_path)
            assert cache["version"] == DATASET_CACHE_VERSION
            assert cache["hash"] == get_hash([self.json_file, str(self.img_path)])
            self.im_files = [lb["im_file"] for lb in cache["labels"]]
        except (FileNotFoundError, AssertionError, AttributeError, KeyError, ModuleNotFoundError):
            cache = self.cache_labels(cache_path)
        cache.pop("hash", None)
        cache.pop("version", None)
        return cache["labels"]

class COCOTrainer(DetectionTrainer):
    """Trainer that uses COCODataset for direct COCO JSON training."""

    def build_dataset(self, img_path, mode="train", batch=None):
        json_file = self.data["train_json"] if mode == "train" else self.data.get("val_json", self.data["train_json"])
        return COCODataset(
            img_path=img_path,
            json_file=json_file,
            imgsz=self.args.imgsz,
            batch_size=batch,
            augment=mode == "train",
            hyp=self.args,
            rect=self.args.rect or mode == "val",
            cache=self.args.cache or None,
            single_cls=self.args.single_cls or False,
            stride=int(self.model.stride.max()) if hasattr(self, "model") and self.model else 32,
            pad=0.0 if mode == "train" else 0.5,
            prefix=colorstr(f"{mode}: "),
            task=self.args.task,
            classes=self.args.classes,
            fraction=self.args.fraction if mode == "train" else 1.0,
        )

model = YOLO("yolo26n.pt")
model.train(data="dataset.yaml", epochs=100, imgsz=640, trainer=COCOTrainer)

하이퍼파라미터 권장 사항은 모델 학습 팁 가이드를 참조하십시오.

FAQ

이 방식과 convert_coco()의 차이점은 무엇인가요?

convert_coco()는 일회성 변환으로 .txt 레이블 파일을 디스크에 기록합니다. 이 접근 방식은 매 학습 시작 시 JSON을 파싱하고 메모리 내에서 주석을 변환합니다. 영구적인 YOLO 형식 레이블이 선호될 때는 convert_coco()를 사용하고, 추가 파일 생성 없이 COCO JSON을 단일 정보 소스로 유지하려면 이 접근 방식을 사용하십시오.

커스텀 코드 없이 COCO JSON으로 YOLO를 학습할 수 있나요?

기본적으로 YOLO .txt 레이블을 예상하는 현재의 Ultralytics 파이프라인으로는 불가능합니다. 이 가이드는 필요한 최소한의 커스텀 코드(데이터셋 클래스 하나와 트레이너 클래스 하나)를 제공합니다. 일단 정의되면 학습에는 표준 model.train() 호출만 필요합니다.

이 방식이 세그멘테이션과 포즈 추정을 지원하나요?

이 가이드는 객체 탐지를 다룹니다. 인스턴스 세그멘테이션 지원을 추가하려면 COCO 주석의 segmentation 폴리곤 데이터를 각 레이블 딕셔너리의 segments 필드에 포함하십시오. 포즈 추정의 경우 keypoints를 포함하십시오. GroundingDataset 소스 코드에서 세그먼트 처리를 위한 참조 구현을 확인할 수 있습니다.

이 커스텀 데이터셋에서 증강(Augmentation)이 작동하나요?

네. COCODataset은 YOLODataset을 확장하므로 내장된 모든 데이터 증강(mosaic, mixup, copy-paste 등)이 수정 없이 그대로 실행됩니다.

카테고리 ID는 어떻게 클래스 인덱스로 매핑되나요?

Categories are sorted by id and mapped to sequential indices starting from 0. This handles 1-based IDs (standard COCO), 0-based IDs, and non-contiguous IDs. The names dictionary in dataset.yaml should follow the same sorted order as the COCO categories array.

미리 변환된 레이블과 비교하여 성능 저하가 있나요?

COCO JSON은 첫 번째 학습 실행 시 한 번만 파싱됩니다. 파싱된 레이블은 .cache 파일에 저장되므로 후속 실행 시에는 재파싱 없이 즉시 로드됩니다. 주석이 메모리에 유지되므로 학습 속도는 표준 YOLO 학습과 동일합니다. JSON 파일이 변경되면 캐시는 자동으로 다시 생성됩니다.

Contributors

RAraimbekovm² GLglenn-jocher¹

Created 2개월 전Updated 2개월 전