跳至内容

COCO Pose 数据集

COCO-Pose数据集是 COCO(Common Objects in Context,上下文中的常见物体)数据集的专门版本,设计用于姿势估计任务。它利用 COCO 关键点 2017 图像和标签来训练用于姿势估计任务的模型(如YOLO )。

姿势样本图像

COCO Pose 预训练模型

模型尺寸
(像素)

50-95
mAPpose
50
速度
CPU ONNX
(毫秒)
Speed
T4 TensorRT10
(ms)
params
(M)
FLOPs
(B)
YOLO11n-pose64050.081.052.4 ± 0.51.7 ± 0.02.97.6
YOLO11s-pose64058.986.390.5 ± 0.62.6 ± 0.09.923.2
YOLO11m-pose64064.989.4187.3 ± 0.84.9 ± 0.120.971.7
YOLO11l-pose64066.189.9247.7 ± 1.16.4 ± 0.126.290.7
YOLO11x-pose64069.591.1488.0 ± 13.912.1 ± 0.258.8203.3

主要功能

  • COCO-Pose 基于 COCO Keypoints 2017 数据集,该数据集包含 20 万张标有关键点的图像,用于姿势估计任务。
  • 数据集支持 17 个人物关键点,便于进行详细的姿势估算。
  • 与 COCO 一样,它提供标准化的评估指标,包括用于姿态估计任务的对象关键点相似性(OKS),因此适合比较模型性能。

数据集结构

COCO-Pose 数据集分为三个子集:

  1. Train2017:该子集包含 COCO 数据集中 118K 幅图像中的一部分,注释用于训练姿势估计模型。
  2. Val2017:该子集包含模型训练过程中用于验证的部分图像。
  3. 测试 2017:该子集包含用于测试和基准测试训练模型的图像。该子集的地面实况注释不公开,其结果将提交给COCO 评估服务器进行性能评估。

应用

The COCO-Pose dataset is specifically used for training and evaluating deep learning models in keypoint detection and pose estimation tasks, such as OpenPose. The dataset's large number of annotated images and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners focused on pose estimation.

数据集 YAML

YAML(另一种标记语言)文件用于定义数据集配置。它包含数据集的路径、类和其他相关信息。就 COCO-Pose 数据集而言,YAML 文件中的 coco-pose.yaml 文件保存在 https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco-pose.yaml.

ultralytics/cfg/datasets/coco-pose.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# COCO 2017 dataset https://cocodataset.org by Microsoft
# Documentation: https://docs.ultralytics.com/datasets/pose/coco/
# Example usage: yolo train data=coco-pose.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco-pose  ← downloads here (20.1 GB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco-pose # dataset root dir
train: train2017.txt # train images (relative to 'path') 118287 images
val: val2017.txt # val images (relative to 'path') 5000 images
test: test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794

# Keypoints
kpt_shape: [17, 3] # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible)
flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]

# Classes
names:
  0: person

# Download script/URL (optional)
download: |
  from ultralytics.utils.downloads import download
  from pathlib import Path

  # Download labels
  dir = Path(yaml['path'])  # dataset root dir
  url = 'https://github.com/ultralytics/assets/releases/download/v0.0.0/'
  urls = [url + 'coco2017labels-pose.zip']  # labels
  download(urls, dir=dir.parent)
  # Download data
  urls = ['http://images.cocodataset.org/zips/train2017.zip',  # 19G, 118k images
          'http://images.cocodataset.org/zips/val2017.zip',  # 1G, 5k images
          'http://images.cocodataset.org/zips/test2017.zip']  # 7G, 41k images (optional)
  download(urls, dir=dir / 'images', threads=3)

使用方法

To train a YOLO11n-pose model on the COCO-Pose dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

列车示例

from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n-pose.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="coco-pose.yaml", epochs=100, imgsz=640)
# Start training from a pretrained *.pt model
yolo pose train data=coco-pose.yaml model=yolo11n-pose.pt epochs=100 imgsz=640

图片和注释示例

COCO-Pose 数据集包含一组不同的人物图像,并标注了关键点。下面是数据集中的一些图像示例及其相应的注释:

数据集样本图像

  • 镶嵌图像:该图像展示了由马赛克数据集图像组成的训练批次。马赛克是一种在训练过程中使用的技术,可将多幅图像合并为单幅图像,以增加每个训练批次中物体和场景的多样性。这有助于提高模型对不同物体尺寸、长宽比和环境的泛化能力。

该示例展示了 COCO-Pose 数据集中图像的多样性和复杂性,以及在训练过程中使用镶嵌技术的好处。

引文和致谢

如果您在研究或开发工作中使用 COCO-Pose 数据集,请引用以下论文:

@misc{lin2015microsoft,
      title={Microsoft COCO: Common Objects in Context},
      author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár},
      year={2015},
      eprint={1405.0312},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

我们衷心感谢 COCO 联盟为计算机视觉界创建并维护这一宝贵资源。有关 COCO-Pose 数据集及其创建者的更多信息,请访问COCO 数据集网站

常见问题

什么是 COCO-Pose 数据集,如何将其与Ultralytics YOLO 一起用于姿势估计?

The COCO-Pose dataset is a specialized version of the COCO (Common Objects in Context) dataset designed for pose estimation tasks. It builds upon the COCO Keypoints 2017 images and annotations, allowing for the training of models like Ultralytics YOLO for detailed pose estimation. For instance, you can use the COCO-Pose dataset to train a YOLO11n-pose model by loading a pretrained model and training it with a YAML configuration. For training examples, refer to the Training documentation.

How can I train a YOLO11 model on the COCO-Pose dataset?

Training a YOLO11 model on the COCO-Pose dataset can be accomplished using either Python or CLI commands. For example, to train a YOLO11n-pose model for 100 epochs with an image size of 640, you can follow the steps below:

列车示例

from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n-pose.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="coco-pose.yaml", epochs=100, imgsz=640)
# Start training from a pretrained *.pt model
yolo pose train data=coco-pose.yaml model=yolo11n-pose.pt epochs=100 imgsz=640

有关培训过程和可用参数的更多详情,请查看培训页面

COCO-Pose 数据集提供了哪些用于评估模型性能的不同指标?

The COCO-Pose dataset provides several standardized evaluation metrics for pose estimation tasks, similar to the original COCO dataset. Key metrics include the Object Keypoint Similarity (OKS), which evaluates the accuracy of predicted keypoints against ground truth annotations. These metrics allow for thorough performance comparisons between different models. For instance, the COCO-Pose pretrained models such as YOLO11n-pose, YOLO11s-pose, and others have specific performance metrics listed in the documentation, like mAPpose50-95 and mAPpose50.

COCO-Pose 数据集的结构和分割方式是怎样的?

COCO-Pose 数据集分为三个子集:

  1. Train2017:包含 118K COCO 图像中的一部分,注释用于训练姿势估计模型。
  2. Val2017:模型训练期间用于验证的选定图像。
  3. Test2017:用于测试和基准测试训练有素的模型的图像。该子集的地面实况注释未公开;结果提交给COCO 评估服务器进行性能评估。

这些子集有助于有效组织训练、验证和测试阶段。有关配置的详细信息,请访问 coco-pose.yaml 文件可在 GitHub.

COCO-Pose 数据集有哪些主要特点和应用?

The COCO-Pose dataset extends the COCO Keypoints 2017 annotations to include 17 keypoints for human figures, enabling detailed pose estimation. Standardized evaluation metrics (e.g., OKS) facilitate comparisons across different models. Applications of the COCO-Pose dataset span various domains, such as sports analytics, healthcare, and human-computer interaction, wherever detailed pose estimation of human figures is required. For practical use, leveraging pretrained models like those provided in the documentation (e.g., YOLO11n-pose) can significantly streamline the process (Key Features).

如果您在研究或开发工作中使用 COCO-Pose 数据集,请在引用论文时使用以下BibTeX 条目

📅 Created 11 months ago ✏️ Updated 20 days ago

评论