Link to this sectionCaltech-256 Dataset#

The Caltech-256 dataset is a classic image classification benchmark of 30,607 images spanning 256 object categories plus one background class. Each category holds at least 80 images of real-world objects — animals, vehicles, household items, and people — making it a larger, more challenging successor to Caltech-101 for object recognition models.

Watch: How to Train Image Classification Model using Caltech-256 Dataset with Ultralytics YOLO26

Automatic Data Splitting

Caltech-256 ships without a predefined train/validation split. The training commands below automatically split it 80% train / 20% validation, so no manual preparation is needed.

Link to this sectionKey Features#

Caltech-256 contains 30,607 color images across 256 object categories plus one 257.clutter background class (257 class folders in total).
The categories span a wide variety of real-world objects, including animals, vehicles, household items, and people.
Each category holds at least 80 images, with the largest holding up to about 800, so class sizes are imbalanced.
Images are of variable sizes and resolutions.
Caltech-256 is widely used to benchmark image classification and object recognition algorithms.

Link to this sectionDataset Structure#

Caltech-256 is distributed as 257 folders — one per class, covering 256 object categories plus a 257.clutter background class — with no predefined train/validation split. When you launch training, Ultralytics automatically partitions the images so models train across all 257 classes without any manual setup:

Classes: 257 (256 object categories + 1 background)
Total images: 30,607
Train/validation split: automatic 80% / 20% (≈24,385 train, ≈6,222 validation)
Images per class: at least 80 (imbalanced, up to about 800)

Link to this sectionApplications#

The Caltech-256 dataset is widely used to train and evaluate image classification and object recognition models, including Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs). Its large category count and high-quality images make it a popular benchmark for machine learning and computer vision research and prototyping.

Link to this sectionUsage#

Train a YOLO model on Caltech-256 for 100 epochs at an image size of 416. For the full list of available arguments, see the Training page and the image classification task guide.

Train Example

from ultralytics import YOLO

# Load a model
model = YOLO("yolo26n-cls.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="caltech256", epochs=100, imgsz=416)

Link to this sectionSample Images and Annotations#

The Caltech-256 dataset contains high-quality color images of various objects, providing a well-structured dataset for image classification tasks. Here are some examples of images from the dataset (credit):

Caltech-256 image classification dataset samples

The samples show the diversity and complexity of the objects in the Caltech-256 dataset, underlining the value of a varied dataset for training robust object recognition models.

Link to this sectionCitations and Acknowledgments#

If you use the Caltech-256 dataset in your research or development work, please cite the following paper:

Quote

@article{griffin2007caltech,
         title={Caltech-256 object category dataset},
         author={Griffin, Gregory and Holub, Alex and Perona, Pietro},
         year={2007}
}

We would like to acknowledge Gregory Griffin, Alex Holub, and Pietro Perona for creating and maintaining the Caltech-256 dataset as a valuable resource for the machine learning and computer vision research community. For more information about the Caltech-256 dataset and its creators, visit the Caltech-256 dataset website.

Link to this sectionFAQ#

Link to this sectionWhat is the Caltech-256 dataset used for in machine learning?#

The Caltech-256 dataset is widely used to train and benchmark image classification and object recognition models. It contains 30,607 images across 256 object categories plus a background class, providing a larger and more challenging benchmark than Caltech-101 for algorithms such as Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs).

Link to this sectionHow can I train an Ultralytics YOLO model on the Caltech-256 dataset?#

To train an Ultralytics YOLO model on Caltech-256, use the code snippets below. The dataset downloads automatically on first use. For a full list of arguments, see the model Training page.

Train Example

from ultralytics import YOLO

# Load a model
model = YOLO("yolo26n-cls.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="caltech256", epochs=100, imgsz=416)

Link to this sectionHow many classes does the Caltech-256 dataset have?#

Caltech-256 contains 256 object categories plus one 257.clutter background class, for 257 class folders and 30,607 images in total. When you train with Ultralytics, the model learns all 257 classes. Each category holds at least 80 images, but class sizes are imbalanced, with the largest holding up to about 800 images.

Link to this sectionHow is the Caltech-256 dataset split into training and validation sets?#

Caltech-256 has no predefined split. The first time you train, Ultralytics automatically divides it 80% training / 20% validation — about 24,385 training and 6,222 validation images — so you do not need to create splits manually. To control the split yourself, organize the images into train/ and val/ folders before training.

Link to this sectionCan I use Ultralytics Platform for training models on the Caltech-256 dataset?#

Yes. Ultralytics Platform lets you manage datasets, train image classification models, and deploy them without extensive coding. It is a convenient way to run Caltech-256 experiments in the cloud, and you can explore more options in our classification datasets overview.

Contributors

GLglenn-jocher¹³ RIRizwanMunawar³ RAraimbekovm² MAMatthewNoyce¹ JKjk4e¹

Created Nov 12, 2023Updated 5 days ago