COCO8 Dataset

Introduction

The Ultralytics COCO8 dataset is a compact yet powerful object detection dataset, consisting of the first 8 images from the COCO train 2017 set—4 for training and 4 for validation. This dataset is specifically designed for rapid testing, debugging, and experimentation with YOLO models and training pipelines. Its small size makes it highly manageable, while its diversity ensures it serves as an effective sanity check before scaling up to larger datasets.

Watch: Ultralytics COCO Dataset Overview

COCO8 is fully compatible with Ultralytics HUB and YOLO11, enabling seamless integration into your computer vision workflows.

Dataset YAML

The COCO8 dataset configuration is defined in a YAML (Yet Another Markup Language) file, which specifies dataset paths, class names, and other essential metadata. You can review the official coco8.yaml file in the Ultralytics GitHub repository.

ultralytics/cfg/datasets/coco8.yaml

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# COCO8 dataset (first 8 images from COCO train2017) by Ultralytics
# Documentation: https://docs.ultralytics.com/datasets/detect/coco8/
# Example usage: yolo train data=coco8.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco8  ← downloads here (1 MB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8 # dataset root dir
train: images/train # train images (relative to 'path') 4 images
val: images/val # val images (relative to 'path') 4 images
test: # test images (optional)

# Classes
names:
  0: person
  1: bicycle
  2: car
  3: motorcycle
  4: airplane
  5: bus
  6: train
  7: truck
  8: boat
  9: traffic light
  10: fire hydrant
  11: stop sign
  12: parking meter
  13: bench
  14: bird
  15: cat
  16: dog
  17: horse
  18: sheep
  19: cow
  20: elephant
  21: bear
  22: zebra
  23: giraffe
  24: backpack
  25: umbrella
  26: handbag
  27: tie
  28: suitcase
  29: frisbee
  30: skis
  31: snowboard
  32: sports ball
  33: kite
  34: baseball bat
  35: baseball glove
  36: skateboard
  37: surfboard
  38: tennis racket
  39: bottle
  40: wine glass
  41: cup
  42: fork
  43: knife
  44: spoon
  45: bowl
  46: banana
  47: apple
  48: sandwich
  49: orange
  50: broccoli
  51: carrot
  52: hot dog
  53: pizza
  54: donut
  55: cake
  56: chair
  57: couch
  58: potted plant
  59: bed
  60: dining table
  61: toilet
  62: tv
  63: laptop
  64: mouse
  65: remote
  66: keyboard
  67: cell phone
  68: microwave
  69: oven
  70: toaster
  71: sink
  72: refrigerator
  73: book
  74: clock
  75: vase
  76: scissors
  77: teddy bear
  78: hair drier
  79: toothbrush

# Download script/URL (optional)
download: https://github.com/ultralytics/assets/releases/download/v0.0.0/coco8.zip

Usage

To train a YOLO11n model on the COCO8 dataset for 100 epochs with an image size of 640, use the following examples. For a full list of training options, see the YOLO Training documentation.

Train Example

PythonCLI

from ultralytics import YOLO

# Load a pretrained YOLO11n model
model = YOLO("yolo11n.pt")

# Train the model on COCO8
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)

# Train YOLO11n on COCO8 using the command line
yolo detect train data=coco8.yaml model=yolo11n.pt epochs=100 imgsz=640

Sample Images and Annotations

Below is an example of a mosaiced training batch from the COCO8 dataset:

Dataset sample image

Mosaiced Image: This image illustrates a training batch where multiple dataset images are combined using mosaic augmentation. Mosaic augmentation increases the diversity of objects and scenes within each batch, helping the model generalize better to various object sizes, aspect ratios, and backgrounds.

This technique is especially useful for small datasets like COCO8, as it maximizes the value of each image during training.

Citations and Acknowledgments

If you use the COCO dataset in your research or development, please cite the following paper:

BibTeX

@misc{lin2015microsoft,
      title={Microsoft COCO: Common Objects in Context},
      author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár},
      year={2015},
      eprint={1405.0312},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Special thanks to the COCO Consortium for their ongoing contributions to the computer vision community.

FAQ

What Is the Ultralytics COCO8 Dataset Used For?

The Ultralytics COCO8 dataset is designed for rapid testing and debugging of object detection models. With only 8 images (4 for training, 4 for validation), it is ideal for verifying your YOLO training pipelines and ensuring everything works as expected before scaling to larger datasets. Explore the COCO8 YAML configuration for more details.

How Do I Train a YOLO11 Model Using the COCO8 Dataset?

You can train a YOLO11 model on COCO8 using either Python or the CLI:

Train Example

PythonCLI

from ultralytics import YOLO

# Load a pretrained YOLO11n model
model = YOLO("yolo11n.pt")

# Train the model on COCO8
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)

yolo detect train data=coco8.yaml model=yolo11n.pt epochs=100 imgsz=640

For additional training options, refer to the YOLO Training documentation.

Why Should I Use Ultralytics HUB for Managing My COCO8 Training?

Ultralytics HUB streamlines dataset management, training, and deployment for YOLO models—including COCO8. With features like cloud training, real-time monitoring, and intuitive dataset handling, HUB enables you to launch experiments with a single click and eliminates manual setup hassles. Learn more about Ultralytics HUB and how it can accelerate your computer vision projects.

What Are the Benefits of Using Mosaic Augmentation in Training With the COCO8 Dataset?

Mosaic augmentation, as used in COCO8 training, combines multiple images into one during each batch. This increases the diversity of objects and backgrounds, helping your YOLO model generalize better to new scenarios. Mosaic augmentation is especially valuable for small datasets, as it maximizes the information available in each training step. For more on this, see the training guide.

How Can I Validate My YOLO11 Model Trained on the COCO8 Dataset?

To validate your YOLO11 model after training on COCO8, use the model's validation commands in either Python or CLI. This evaluates your model's performance using standard metrics. For step-by-step instructions, visit the YOLO Validation documentation.

📅 Created 1 year ago ✏️ Updated 7 days ago