Link to this sectionCOCO-Pose Dataset#

Name: COCO-Pose Estimation Dataset
Creator: COCO Consortium
License: https://cocodataset.org/#termsofuse
Keywords: COCO-Pose, pose estimation, dataset, keypoints, COCO Keypoints 2017, YOLO, deep learning, computer vision

The COCO-Pose dataset adapts COCO (Common Objects in Context) for pose estimation: 58,945 images from COCO Keypoints 2017, annotated with 156,165 people using a 17-keypoint schema. It is the standard set for training and benchmarking keypoint models such as Ultralytics YOLO26, and the 8-image COCO8-Pose subset mirrors its format for quick sanity checks.

COCO pose estimation with human keypoints

Link to this sectionCOCO-Pose Pretrained Models#

Model	size ^(pixels)	mAP^{pose 50-95(e2e)}	mAP^pose 50(e2e)	Speed ^{CPU ONNX (ms)}	Speed ^{T4 TensorRT10 (ms)}	params ^(M)	FLOPs ^(B)
YOLO26n-pose	640	57.2	83.3	40.3 ± 0.5	1.8 ± 0.0	2.9	7.5
YOLO26s-pose	640	63.0	86.6	85.3 ± 0.9	2.7 ± 0.0	10.4	23.9
YOLO26m-pose	640	68.8	89.6	218.0 ± 1.5	5.0 ± 0.1	21.5	73.1
YOLO26l-pose	640	70.4	90.5	275.4 ± 2.4	6.5 ± 0.1	25.9	91.3
YOLO26x-pose	640	71.6	91.6	565.4 ± 3.0	12.2 ± 0.2	57.6	201.7

Link to this sectionKey Features#

COCO-Pose builds upon the COCO Keypoints 2017 challenge, which labels 1,710,498 individual keypoints across 156,165 annotated people.
Each person annotation uses 17 keypoint types — nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles — stored as (x, y, visibility) triplets.
Like COCO, it provides standardized evaluation metrics, including Object Keypoint Similarity (OKS) for pose estimation tasks, making it suitable for comparing model performance.
Download size: ~20.2 GB on first use (train2017.zip + val2017.zip + labels). The 7 GB test2017.zip is not fetched automatically, since those images have withheld ground truth and are only needed for a test-dev2017 submission.

Link to this sectionDataset Structure#

For training and validation, COCO-Pose includes only COCO 2017 images with keypoint-annotated people, so its labeled splits are smaller than full COCO's. Its YAML defines three subsets:

Train2017: This subset contains 56,599 images from the COCO dataset, annotated for training pose estimation models.
Val2017: This subset has 2,346 images used for validation purposes during model training.
Test-dev2017: A 20,288-image subset of the full 40,670-image test2017 set with withheld ground truth. The dataset YAML links this split to the COCO test-dev keypoints evaluation server.

Training at this scale is where Ultralytics Platform helps most — it manages the compute so you can launch and monitor runs without provisioning your own GPUs.

Link to this sectionApplications#

The COCO-Pose dataset is specifically used for training and evaluating deep learning models on keypoint detection and pose estimation. The dataset's large number of annotated images and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners working on human pose.

Link to this sectionDataset YAML#

A YAML file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO-Pose dataset, the coco-pose.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco-pose.yaml.

ultralytics/cfg/datasets/coco-pose.yaml

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# COCO 2017 Keypoints dataset https://cocodataset.org by Microsoft
# Documentation: https://docs.ultralytics.com/datasets/pose/coco
# Example usage: yolo train data=coco-pose.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco-pose ← downloads here (20.2 GB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: coco-pose # dataset root dir
train: train2017.txt # train images (relative to 'path') 56599 images
val: val2017.txt # val images (relative to 'path') 2346 images
test: test-dev2017.txt # 20288 of 40670 images, submit to https://codalab.lisn.upsaclay.fr/competitions/7403

# Keypoints
kpt_shape: [17, 3] # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible)
flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]

# Classes
names:
  0: person

# Keypoint names per class
kpt_names:
  0:
    - nose
    - left_eye
    - right_eye
    - left_ear
    - right_ear
    - left_shoulder
    - right_shoulder
    - left_elbow
    - right_elbow
    - left_wrist
    - right_wrist
    - left_hip
    - right_hip
    - left_knee
    - right_knee
    - left_ankle
    - right_ankle

# Download script/URL (optional)
download: |
  from pathlib import Path

  from ultralytics.utils import ASSETS_URL
  from ultralytics.utils.downloads import download

  # Download labels
  dir = Path(yaml["path"])  # dataset root dir

  urls = [f"{ASSETS_URL}/coco2017labels-pose.zip"]
  download(urls, dir=dir.parent)

  # Download data (test2017.zip excluded: ground truth is withheld, only used for the CodaLab test-dev split)
  urls = [
      "http://images.cocodataset.org/zips/train2017.zip",  # 19G, 118k images
      "http://images.cocodataset.org/zips/val2017.zip",  # 1G, 5k images
  ]
  download(urls, dir=dir / "images", threads=3)

Link to this sectionUsage#

To train a YOLO26n-pose model on the COCO-Pose dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

Train Example

from ultralytics import YOLO

# Load a model
model = YOLO("yolo26n-pose.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="coco-pose.yaml", epochs=100, imgsz=640)

Link to this sectionSample Images and Annotations#

The COCO-Pose dataset contains a diverse set of images with human figures annotated with keypoints. Here are some examples of images from the dataset, along with their corresponding annotations:

COCO pose estimation dataset mosaic training batch

Mosaiced Image: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.

The example showcases the variety and complexity of the images in the COCO-Pose dataset and the benefits of using mosaicing during the training process.

Link to this sectionCitations and Acknowledgments#

If you use the COCO-Pose dataset in your research or development work, please cite the following paper:

Quote

@misc{lin2015microsoft,
      title={Microsoft COCO: Common Objects in Context},
      author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár},
      year={2015},
      eprint={1405.0312},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO-Pose dataset and its creators, visit the COCO dataset website.

Link to this sectionFAQ#

Link to this sectionWhat is the COCO-Pose dataset and how is it used with Ultralytics YOLO for pose estimation?#

COCO-Pose supplies the COCO Keypoints 2017 images and annotations converted to YOLO keypoint format, using a 17-keypoint schema across 58,945 images. Point any Ultralytics YOLO pose model at it with data=coco-pose.yaml, and the Training page documents every argument you can tune from there.

Link to this sectionHow can I train a YOLO26 model on the COCO-Pose dataset?#

Load yolo26n-pose.pt and call model.train(data="coco-pose.yaml", epochs=100, imgsz=640) — see the Train Example above for the full Python and CLI snippets, and the training page for a comprehensive list of arguments.

Link to this sectionWhat are the different metrics provided by the COCO-Pose dataset for evaluating model performance?#

The COCO-Pose dataset provides several standardized evaluation metrics for pose estimation tasks, similar to the original COCO dataset. Key metrics include the Object Keypoint Similarity (OKS), which evaluates the accuracy of predicted keypoints against ground truth annotations. These metrics allow for thorough performance comparisons between different models. For instance, the COCO-Pose pretrained models such as YOLO26n-pose, YOLO26s-pose, and others have specific performance metrics listed in the documentation, like mAP^pose50-95 and mAP^pose50.

Link to this sectionHow is the dataset structured and split for the COCO-Pose dataset?#

COCO-Pose ships two labeled splits: 56,599 train2017 images and 2,346 val2017 images. A third split, test-dev2017 (20,288 of the full 40,670 test2017 images), keeps its ground truth private; the dataset YAML links it to the COCO test-dev keypoints evaluation server. See the Dataset Structure section, or the coco-pose.yaml file on GitHub for the exact split paths.

Link to this sectionWhat are the key features and applications of the COCO-Pose dataset?#

COCO-Pose uses 17 human keypoint types and inherits COCO's standardized metrics, including Object Keypoint Similarity (OKS), for comparing models. That combination suits human pose applications such as sports analytics, healthcare, and human-computer interaction. Pretrained YOLO26-pose weights are listed under COCO-Pose Pretrained Models.

For more on keypoint models, see the Pose Estimation task docs.

Contributors

GLglenn-jocher¹⁵ RAraimbekovm³ RIRizwanMunawar³ JKjk4e² Y-Y-T-G¹ AMambitious-octopus¹ MAMatthewNoyce¹ LUlunarifish¹

Created Nov 12, 2023Updated 6 days ago