Skip to content

ImageNet Dataset

ImageNet is a large-scale database of annotated images designed for use in visual object recognition research. It contains over 14 million images, with each image annotated using WordNet synsets, making it one of the most extensive resources available for training deep learning models in computer vision tasks.

ImageNet Pretrained Models

Model size
(pixels)
acc
top1
acc
top5
Speed
CPU ONNX
(ms)
Speed
A100 TensorRT
(ms)
params
(M)
FLOPs
(B) at 640
YOLOv8n-cls 224 69.0 88.3 12.9 0.31 2.7 4.3
YOLOv8s-cls 224 73.8 91.7 23.4 0.35 6.4 13.5
YOLOv8m-cls 224 76.8 93.5 85.4 0.62 17.0 42.7
YOLOv8l-cls 224 76.8 93.5 163.0 0.87 37.5 99.7
YOLOv8x-cls 224 79.0 94.6 232.0 1.01 57.4 154.8

Key Features

  • ImageNet contains over 14 million high-resolution images spanning thousands of object categories.
  • The dataset is organized according to the WordNet hierarchy, with each synset representing a category.
  • ImageNet is widely used for training and benchmarking in the field of computer vision, particularly for image classification and object detection tasks.
  • The annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been instrumental in advancing computer vision research.

Dataset Structure

The ImageNet dataset is organized using the WordNet hierarchy. Each node in the hierarchy represents a category, and each category is described by a synset (a collection of synonymous terms). The images in ImageNet are annotated with one or more synsets, providing a rich resource for training models to recognize various objects and their relationships.

ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

The annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been an important event in the field of computer vision. It has provided a platform for researchers and developers to evaluate their algorithms and models on a large-scale dataset with standardized evaluation metrics. The ILSVRC has led to significant advancements in the development of deep learning models for image classification, object detection, and other computer vision tasks.

Applications

The ImageNet dataset is widely used for training and evaluating deep learning models in various computer vision tasks, such as image classification, object detection, and object localization. Some popular deep learning architectures, such as AlexNet, VGG, and ResNet, were developed and benchmarked using the ImageNet dataset.

Usage

To train a deep learning model on the ImageNet dataset for 100 epochs with an image size of 224x224, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

Train Example

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-cls.pt')  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data='imagenet', epochs=100, imgsz=224)
# Start training from a pretrained *.pt model
yolo train data=imagenet model=yolov8n-cls.pt epochs=100 imgsz=224

Sample Images and Annotations

The ImageNet dataset contains high-resolution images spanning thousands of object categories, providing a diverse and extensive dataset for training and evaluating computer vision models. Here are some examples of images from the dataset:

Dataset sample images

The example showcases the variety and complexity of the images in the ImageNet dataset, highlighting the importance of a diverse dataset for training robust computer vision models.

Citations and Acknowledgments

If you use the ImageNet dataset in your research or development work, please cite the following paper:

@article{ILSVRC15,
         author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
         title={ImageNet Large Scale Visual Recognition Challenge},
         year={2015},
         journal={International Journal of Computer Vision (IJCV)},
         volume={115},
         number={3},
         pages={211-252}
}

We would like to acknowledge the ImageNet team, led by Olga Russakovsky, Jia Deng, and Li Fei-Fei, for creating and maintaining the ImageNet dataset as a valuable resource for the machine learning and computer vision research community. For more information about the ImageNet dataset and its creators, visit the ImageNet website.



Created 2023-11-12, Updated 2024-04-17
Authors: glenn-jocher (5), RizwanMunawar (1)

Comments