No license

Link to this sectionPASCAL VOCデータセット#

PASCAL VOC (Visual Object Classes) データセットは、20種類の日常的なオブジェクトクラスを対象とした、物体検出の古典的なベンチマークです。Ultralyticsの VOC.yaml 設定では、VOC2007およびVOC2012のtrainval分割を組み合わせて16,551枚の画像からなるトレーニングセットを作成し、公開されている4,952枚のVOC2007テスト用画像で検証を行います。初回使用時に自動的にすべて（2.8 GB）がダウンロードされます。

Watch: How to Train Ultralytics YOLO on the Pascal VOC Dataset | Object Detection | Computer Vision 🚀

PASCAL VOCチャレンジは2005年から2012年まで開催され、物体検出モデルの評価方法を形作りました。このベンチマークは画像分類、検出、セグメンテーションのタスクに及び、平均適合率 (mAP) を標準的な検出指標として普及させました。Ultralyticsの VOC.yaml 設定は検出用のアノテーションを使用し、ダウンロード中に元のXML形式のBBoxをYOLO形式に変換します。

Link to this section主な特徴#

20種類の日常的なオブジェクトクラス: person（人）、6種類の動物（bird、cat、cow、dog、horse、sheep）、7種類の車両（aeroplane、bicycle、boat、bus、car、motorbike、train）、および6種類の屋内オブジェクト（bottle、chair、diningtable、pottedplant、sofa、tvmonitor）。
2つのチャレンジ世代の統合: トレーニングでは、VOC2007 trainval（5,011枚）とVOC2012 trainval（11,540枚）がマージされます。
標準化された評価: 数十年にわたって公開されてきたVOCのベースラインは、検出モデルを比較する際の便利な参照ポイントとなります。
YOLO対応: ダウンロードスクリプトがアーカイブを取得し、アノテーションを自動的に変換するため、手動での準備は不要です。

Link to this sectionデータセットの構造#

Ultralyticsの VOC.yaml 設定では、以下の分割が定義されています。

分割	画像	ソース
トレーニング	16,551	VOC2007 trainval (5,011) + VOC2012 trainval (11,540)
バリデーション	4,952	VOC2007テスト用、トレーニング中の評価に使用
テスト	4,952	同じVOC2007テスト用画像（設定では個別のホールドアウト分割は定義されていません）

VOC2007のテスト用アノテーションは、その年のチャレンジ終了後に公開されたため、この分割をラベル付きの検証セットとして使用できます。VOC2012のテスト用アノテーションは非公開のままであり、その結果は公式のPASCAL評価サーバーでのみスコア付けが可能であるため、本設定には含まれていません。

「難しい」オブジェクトは除外

自動変換スクリプトは、元のVOC XMLアノテーションで difficult とフラグが立てられたオブジェクトをスキップするため、クラスごとのインスタンス数は公式のVOC統計とはわずかに異なります。

VOC on Ultralytics Platform を探索して、アノテーションのオーバーレイ付き画像を閲覧したり、Chartsタブでクラス分布やBBoxのヒートマップを確認したり、自身のクラウドモデルのトレーニング用として複製することができます。

Link to this sectionアプリケーション#

PASCAL VOCは、より大規模な COCO dataset が登場する以前、物体検出研究における主要なベンチマークでした。Faster R-CNN や SSD などの検出器が初期の成果を報告しており、Ultralytics YOLO モデルもそのままトレーニング可能です。今日では、以下の用途で依然として人気があります。

長年にわたって公開されてきたベースラインに対する新しい検出アーキテクチャのベンチマーク
迅速な実験やコースワーク — 16,551枚のトレーニング画像により、COCOよりもはるかに高速にトレーニング可能
コンパクトで十分に理解された日常的なクラスセットを用いた転移学習の研究

Link to this sectionデータセット YAML#

VOC.yaml ファイルは、データセットのパス、20種類のクラス名、自動ダウンロードおよび変換スクリプトといったデータセット設定を定義します。このファイルは、Ultralytics リポジトリ（https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VOC.yaml）で管理されています。

ultralytics/cfg/datasets/VOC.yaml

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# PASCAL VOC dataset http://host.robots.ox.ac.uk/pascal/VOC by University of Oxford
# Documentation: https://docs.ultralytics.com/datasets/detect/voc
# Example usage: yolo train data=VOC.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── VOC ← downloads here (2.8 GB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: VOC
train: # train images (relative to 'path') 16551 images
  - images/train2012
  - images/train2007
  - images/val2012
  - images/val2007
val: # val images (relative to 'path') 4952 images
  - images/test2007
test: # test images (optional)
  - images/test2007

# Classes
names:
  0: aeroplane
  1: bicycle
  2: bird
  3: boat
  4: bottle
  5: bus
  6: car
  7: cat
  8: chair
  9: cow
  10: diningtable
  11: dog
  12: horse
  13: motorbike
  14: person
  15: pottedplant
  16: sheep
  17: sofa
  18: train
  19: tvmonitor

# Download script/URL (optional) ---------------------------------------------------------------------------------------
download: |
  import xml.etree.ElementTree as ET
  from pathlib import Path

  from ultralytics.utils.downloads import download
  from ultralytics.utils import ASSETS_URL, TQDM

  def convert_label(path, lb_path, year, image_id):
      """Converts XML annotations from VOC format to YOLO format by extracting bounding boxes and class IDs."""

      def convert_box(size, box):
          dw, dh = 1.0 / size[0], 1.0 / size[1]
          x, y, w, h = (box[0] + box[1]) / 2.0 - 1, (box[2] + box[3]) / 2.0 - 1, box[1] - box[0], box[3] - box[2]
          return x * dw, y * dh, w * dw, h * dh

      with open(path / f"VOC{year}/Annotations/{image_id}.xml") as in_file, open(lb_path, "w", encoding="utf-8") as out_file:
          tree = ET.parse(in_file)
          root = tree.getroot()
          size = root.find("size")
          w = int(size.find("width").text)
          h = int(size.find("height").text)

          names = list(yaml["names"].values())  # names list
          for obj in root.iter("object"):
              cls = obj.find("name").text
              if cls in names and int(obj.find("difficult").text) != 1:
                  xmlbox = obj.find("bndbox")
                  bb = convert_box((w, h), [float(xmlbox.find(x).text) for x in ("xmin", "xmax", "ymin", "ymax")])
                  cls_id = names.index(cls)  # class id
                  out_file.write(" ".join(str(a) for a in (cls_id, *bb)) + "\n")

  # Download
  dir = Path(yaml["path"])  # dataset root dir
  urls = [
      f"{ASSETS_URL}/VOCtrainval_06-Nov-2007.zip",  # 446MB, 5011 images
      f"{ASSETS_URL}/VOCtest_06-Nov-2007.zip",  # 438MB, 4952 images
      f"{ASSETS_URL}/VOCtrainval_11-May-2012.zip",  # 1.95GB, 17125 images
  ]
  download(urls, dir=dir / "images", threads=3, exist_ok=True)  # download and unzip over existing (required)

  # Convert
  path = dir / "images/VOCdevkit"
  for year, image_set in ("2012", "train"), ("2012", "val"), ("2007", "train"), ("2007", "val"), ("2007", "test"):
      imgs_path = dir / "images" / f"{image_set}{year}"
      lbs_path = dir / "labels" / f"{image_set}{year}"
      imgs_path.mkdir(exist_ok=True, parents=True)
      lbs_path.mkdir(exist_ok=True, parents=True)

      with open(path / f"VOC{year}/ImageSets/Main/{image_set}.txt") as f:
          image_ids = f.read().strip().split()
      for id in TQDM(image_ids, desc=f"{image_set}{year}"):
          f = path / f"VOC{year}/JPEGImages/{id}.jpg"  # old img path
          lb_path = (lbs_path / f.name).with_suffix(".txt")  # new label path
          f.rename(imgs_path / f.name)  # move image
          convert_label(path, lb_path, year, id)  # convert labels to YOLO format

Link to this section使用方法#

2.8 GBのダウンロード

VOCは初回トレーニング時に自動的にダウンロードされます（合計2.8 GBの3つのアーカイブ）。抽出および変換時には、約6 GBの空きディスク容量が必要です。

VOCデータセット上でYOLO26nモデルを画像サイズ640で100 エポックトレーニングするには、以下のコードスニペットを使用できます。利用可能な引数の包括的なリストについては、モデルのトレーニングページを参照してください。

学習例

from ultralytics import YOLO

# Load a model
model = YOLO("yolo26n.pt")  # load a pretrained model (recommended for training)

# Train the model - dataset will auto-download on first run
results = model.train(data="VOC.yaml", epochs=100, imgsz=640)

Link to this sectionサンプル画像とアノテーション#

下の画像は、VOCデータセットのモザイク処理されたトレーニングバッチを示しています。モザイク処理は複数の画像を1つのトレーニングサンプルに組み合わせることで、各バッチでモデルが目にするオブジェクト、スケール、シーンの文脈の多様性を高めます。詳細は YOLOデータ拡張ガイドを参照してください。

Pascal VOCデータセットのモザイク学習バッチ

Link to this section引用と謝辞#

VOCデータセットを研究や開発で使用する場合は、以下の論文を引用してください。

引用

@article{everingham2010pascal,
  author={Everingham, Mark and Van Gool, Luc and Williams, Christopher K. I. and Winn, John and Zisserman, Andrew},
  journal={International Journal of Computer Vision},
  title={The Pascal Visual Object Classes (VOC) Challenge},
  year={2010},
  volume={88},
  number={2},
  pages={303-338},
  doi={10.1007/s11263-009-0275-4}}

コンピュータビジョンコミュニティのためにこの貴重なリソースを作成・維持してくださったPASCAL VOCコンソーシアムに感謝いたします。VOCデータセットとその作成者の詳細については、PASCAL VOCデータセットのウェブサイトをご覧ください。

Link to this sectionよくある質問 (FAQ)#

Link to this sectionPASCAL VOCデータセットは何に使用されますか？#

PASCAL VOCは、person、car、dog、chairといった20種類の日常的なオブジェクトクラスを用いて、物体検出モデルをトレーニングおよびベンチマークするために使用されます。コンパクトで完全にラベル付けされており、長年公開されてきたベースラインの裏付けがあるため、新しいアーキテクチャの検証、コースワークの実験、および迅速な転移学習の研究において一般的な選択肢となっています。

Link to this sectionPASCAL VOCデータセットには何枚の画像が含まれていますか？#

UltralyticsのVOC設定には21,503枚の画像が含まれています。トレーニング用に16,551枚（VOC2007 trainval + VOC2012 trainval）、検証用に4,952枚（VOC2007テストセット）です。すべての分割で同じ20クラスを共有しています。詳細な内訳については Dataset Structure を参照してください。

Link to this sectionPASCAL VOCデータセットをダウンロードするにはどうすればよいですか？#

data="VOC.yaml" で初めてトレーニングを行うと、VOCは自動的にダウンロードされます。手動の手順は不要です。スクリプトがUltralyticsのGitHubリリースアセットから3つのアーカイブ（2.8 GB）を取得し、XMLアノテーションをYOLO形式に変換します。

Link to this sectionVOCデータセットでYOLO26モデルをトレーニングするにはどうすればよいですか？#

VOCでYOLO26nモデルを画像サイズ640、100エポックでトレーニングする例：

学習例

from ultralytics import YOLO

# Load a model
model = YOLO("yolo26n.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="VOC.yaml", epochs=100, imgsz=640)

詳細な設定については、学習ページおよびモデル学習のヒントを参照してください。

Link to this sectionVOC2007とVOC2012の違いは何ですか？#

どちらのチャレンジも同じ20クラスを共有していますが、提供される画像は異なります。VOC2007は5,011枚のtrainval画像と、アノテーションが公開されている4,952枚のテスト用セットを提供します。VOC2012は11,540枚のtrainval画像を提供しますが、そのテスト用アノテーションは非公開であり、公式評価サーバーでのみスコア付けされます。Ultralyticsの VOC.yaml はトレーニング用に両方のtrainvalセットをマージし、VOC2007テスト用データセットで検証を行います。

Link to this sectionPASCAL VOCはCOCOデータセットとどのように比較されますか？#

VOCはより小さくシンプルです。VOCは20クラス・21,503枚の画像であるのに対し、COCOは80クラス・330K枚の画像です。VOCの結果は伝統的にIoU 0.5でのmAPとして報告されますが、COCOは0.5から0.95までのIoU閾値にわたる平均mAPを使用します。VOCはトレーニングがはるかに高速で迅速な実験に適しており、COCO dataset は本番環境規模のベンチマークにおける標準です。

Link to this sectionVOC.yamlでセグメンテーションモデルをトレーニングできますか？#

No — VOC.yaml is a detection-only configuration: its converter extracts bounding boxes from the VOC XML annotations, and the segmentation masks included in the original benchmark are not converted. To train an instance segmentation model, use a dataset with polygon labels such as COCO-Seg with a yolo26n-seg.pt model.

貢献者

GLglenn-jocher¹⁵ RAraimbekovm² RIRizwanMunawar² XUxusuyong¹ MAMatthewNoyce¹

作成日 2023年11月12日更新日 3 日前