跳至内容

ImageNet 数据集

ImageNet is a large-scale database of annotated images designed for use in visual object recognition research. It contains over 14 million images, with each image annotated using WordNet synsets, making it one of the most extensive resources available for training deep learning models in computer vision tasks.

ImageNet 预训练模型

模型 尺寸
(像素)
acc
top1
acc
top5
速度
CPU ONNX
(毫秒)
Speed
T4 TensorRT10
(ms)
params
(M)
FLOPs
(B) at 640
YOLO11n-cls 224 70.0 89.4 5.0 ± 0.3 1.1 ± 0.0 1.6 3.3
YOLO11s-cls 224 75.4 92.7 7.9 ± 0.2 1.3 ± 0.0 5.5 12.1
YOLO11m-cls 224 77.3 93.9 17.2 ± 0.4 2.0 ± 0.0 10.4 39.3
YOLO11l-cls 224 78.3 94.3 23.2 ± 0.3 2.8 ± 0.0 12.9 49.4
YOLO11x-cls 224 79.5 94.9 41.4 ± 0.9 3.8 ± 0.0 28.4 110.4

主要功能

  • ImageNet 包含 1400 多万张高分辨率图像,涵盖数千个对象类别。
  • 该数据集按照 WordNet 层次结构组织,每个同义词集代表一个类别。
  • ImageNet is widely used for training and benchmarking in the field of computer vision, particularly for image classification and object detection tasks.
  • 一年一度的 ImageNet 大规模视觉识别挑战赛(ILSVRC)在推动计算机视觉研究方面发挥了重要作用。

数据集结构

ImageNet 数据集采用 WordNet 层次结构组织。层次结构中的每个节点代表一个类别,每个类别由一个同义词集(同义词的集合)来描述。ImageNet 中的图像都标注了一个或多个同义词集,为训练模型识别各种对象及其关系提供了丰富的资源。

ImageNet 大规模视觉识别挑战赛 (ILSVRC)

一年一度的ImageNet 大规模视觉识别挑战赛(ILSVRC)一直是计算机视觉领域的一项重要赛事。它为研究人员和开发人员提供了一个在大规模数据集上评估其算法和模型的平台,并采用标准化的评估指标。ILSVRC 在开发用于图像分类、物体检测和其他计算机视觉任务的深度学习模型方面取得了重大进展。

应用

ImageNet 数据集广泛用于训练和评估各种计算机视觉任务中的深度学习模型,如图像分类、物体检测和物体定位。一些流行的深度学习架构,如 AlexNet、VGG 和 ResNet,都是利用 ImageNet 数据集开发和基准测试的。

使用方法

To train a deep learning model on the ImageNet dataset for 100 epochs with an image size of 224x224, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

列车示例

from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n-cls.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="imagenet", epochs=100, imgsz=224)
# Start training from a pretrained *.pt model
yolo classify train data=imagenet model=yolo11n-cls.pt epochs=100 imgsz=224

图片和注释示例

ImageNet 数据集包含跨越数千个对象类别的高分辨率图像,为计算机视觉模型的训练和评估提供了一个多样化和广泛的数据集。下面是该数据集中的一些图像示例:

数据集样本图像

该示例展示了 ImageNet 数据集中图像的多样性和复杂性,突出了多样化数据集对训练强大的计算机视觉模型的重要性。

引文和致谢

如果您在研究或开发工作中使用 ImageNet 数据集,请引用以下论文:

@article{ILSVRC15,
         author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
         title={ImageNet Large Scale Visual Recognition Challenge},
         year={2015},
         journal={International Journal of Computer Vision (IJCV)},
         volume={115},
         number={3},
         pages={211-252}
}

We would like to acknowledge the ImageNet team, led by Olga Russakovsky, Jia Deng, and Li Fei-Fei, for creating and maintaining the ImageNet dataset as a valuable resource for the machine learning and computer vision research community. For more information about the ImageNet dataset and its creators, visit the ImageNet website.

常见问题

ImageNet 数据集是什么,如何用于计算机视觉?

ImageNet 数据集是一个大型数据库,由超过 1400 万张高分辨率图像组成,使用 WordNet 同义词集进行分类。它被广泛用于视觉对象识别研究,包括图像分类和对象检测。该数据集的注释和庞大的数据量为训练深度学习模型提供了丰富的资源。值得注意的是,AlexNet、VGG 和 ResNet 等模型都是利用 ImageNet 进行训练和基准测试的,充分展示了 ImageNet 在推动计算机视觉发展方面的作用。

如何在 ImageNet 数据集上使用预训练的YOLO 模型进行图像分类?

要在 ImageNet 数据集上使用预训练的Ultralytics YOLO 模型进行图像分类,请按照以下步骤操作:

列车示例

from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n-cls.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="imagenet", epochs=100, imgsz=224)
# Start training from a pretrained *.pt model
yolo classify train data=imagenet model=yolo11n-cls.pt epochs=100 imgsz=224

有关更深入的培训指导,请参阅我们的培训页面

Why should I use the Ultralytics YOLO11 pretrained models for my ImageNet dataset projects?

Ultralytics YOLO11 pretrained models offer state-of-the-art performance in terms of speed and accuracy for various computer vision tasks. For example, the YOLO11n-cls model, with a top-1 accuracy of 69.0% and a top-5 accuracy of 88.3%, is optimized for real-time applications. Pretrained models reduce the computational resources required for training from scratch and accelerate development cycles. Learn more about the performance metrics of YOLO11 models in the ImageNet Pretrained Models section.

ImageNet 数据集的结构是怎样的?

ImageNet 数据集采用 WordNet 层次结构组织,层次结构中的每个节点代表一个由同义词集(同义词的集合)描述的类别。这种结构允许进行详细注释,是训练模型识别各种对象的理想选择。ImageNet 的多样性和丰富的注释使其成为开发稳健、可泛化的深度学习模型的宝贵数据集。有关该组织的更多信息,请参阅数据集结构部分。

ImageNet 大规模视觉识别挑战赛 (ILSVRC) 在计算机视觉领域发挥着什么作用?

The annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been pivotal in driving advancements in computer vision by providing a competitive platform for evaluating algorithms on a large-scale, standardized dataset. It offers standardized evaluation metrics, fostering innovation and development in areas such as image classification, object detection, and image segmentation. The challenge has continuously pushed the boundaries of what is possible with deep learning and computer vision technologies.


📅 Created 11 months ago ✏️ Updated 7 days ago

评论