Train Custom Data
📚 This guide explains how to train your own custom dataset with YOLOv5 🚀.
Before You Start
Train On Custom Data
Creating a custom model to detect your objects is an iterative process of collecting and organizing images, labeling your objects of interest, training a model, deploying it into the wild to make predictions, and then using that deployed model to collect examples of edge cases to repeat and improve.
Ultralytics offers two licensing options:
- The AGPL-3.0 License, an OSI-approved open-source license ideal for students and enthusiasts.
- The Enterprise License for businesses seeking to incorporate our AI models into their products and services.
For more details see Ultralytics Licensing.
YOLOv5 models must be trained on labelled data in order to learn classes of objects in that data. There are two options for creating your dataset before you start training:
Option 1: Create a Roboflow Dataset
1.1 Collect Images
Your model will learn by example. Training on images similar to the ones it will see in the wild is of the utmost importance. Ideally, you will collect a wide variety of images from the same configuration (camera, angle, lighting, etc.) as you will ultimately deploy your project.
1.2 Create Labels
Once you have collected images, you will need to annotate the objects of interest to create a ground truth for your model to learn from.
1.3 Prepare Dataset for YOLOv5
Whether you label your images with Roboflow or not, you can use it to convert your dataset into YOLO format, create a YOLOv5 YAML configuration file, and host it for importing into your training script.
Create a free Roboflow account and upload your dataset to a
Public workspace, label any unannotated images, then generate and export a version of your dataset in
YOLOv5 Pytorch format.
Note: YOLOv5 does online augmentation during training, so we do not recommend applying any augmentation steps in Roboflow for training with YOLOv5. But we recommend applying the following preprocessing steps:
- Auto-Orient - to strip EXIF orientation from your images.
- Resize (Stretch) - to the square input size of your model (640x640 is the YOLOv5 default).
Generating a version will give you a snapshot of your dataset, so you can always go back and compare your future model training runs against it, even if you add more images or change its configuration later.
YOLOv5 Pytorch format, then copy the snippet into your training script or notebook to download your dataset.
Option 2: Create a Manual Dataset
COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory
path and relative paths to
test image directories (or
*.txt files with image paths) and 2) a class
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128 # dataset root dir
train: images/train2017 # train images (relative to 'path') 128 images
val: images/train2017 # val images (relative to 'path') 128 images
test: # test images (optional)
# Classes (80 COCO classes)
77: teddy bear
78: hair drier
2.2 Create Labels
After using an annotation tool to label your images, export your labels to YOLO format, with one
*.txt file per image (if no objects in image, no
*.txt file is required). The
*.txt file specifications are:
- One row per object
- Each row is
class x_center y_center width heightformat.
- Box coordinates must be in normalized xywh format (from 0 to 1). If your boxes are in pixels, divide
widthby image width, and
heightby image height.
- Class numbers are zero-indexed (start from 0).
The label file corresponding to the above image contains 2 persons (class
0) and a tie (class
2.3 Organize Directories
Organize your train and val images and labels according to the example below. YOLOv5 assumes
/coco128 is inside a
/datasets directory next to the
/yolov5 directory. YOLOv5 locates labels automatically for each image by replacing the last instance of
/images/ in each image path with
/labels/. For example:
3. Select a Model
Train a YOLOv5s model on COCO128 by specifying dataset, batch-size, image size and either pretrained
--weights yolov5s.pt (recommended), or randomly initialized
--weights '' --cfg yolov5s.yaml (not recommended). Pretrained weights are auto-downloaded from the latest YOLOv5 release.
--cache ram or
--cache disk to speed up training (requires significant RAM/disk resources).
💡 Always train from a local dataset. Mounted or network drives like Google Drive will be very slow.
Comet Logging and Visualization 🌟 NEW
Comet is now fully integrated with YOLOv5. Track and visualize model metrics in real time, save your hyperparameters, datasets, and model checkpoints, and visualize your model predictions with Comet Custom Panels! Comet makes sure you never lose track of your work and makes it easy to share results and collaborate across teams of all sizes!
Getting started is easy:
To learn more about all the supported Comet features for this integration, check out the Comet Tutorial. If you'd like to learn more about Comet, head over to our documentation. Get started by trying out the Comet Colab Notebook:
ClearML Logging and Automation 🌟 NEW
ClearML is completely integrated into YOLOv5 to track your experimentation, manage dataset versions and even remotely execute training runs. To enable ClearML:
pip install clearml
clearml-initto connect to a ClearML server
You'll get all the great expected features from an experiment manager: live updates, model upload, experiment comparison etc. but ClearML also tracks uncommitted changes and installed packages for example. Thanks to that ClearML Tasks (which is what we call experiments) are also reproducible on different machines! With only 1 extra line, we can schedule a YOLOv5 training task on a queue to be executed by any number of ClearML Agents (workers).
You can use ClearML Data to version your dataset and then pass it to YOLOv5 simply using its unique ID. This will help you keep track of your data without adding extra hassle. Explore the ClearML Tutorial for details!
This directory contains train and val statistics, mosaics, labels, predictions and augmented mosaics, as well as metrics and charts including precision-recall (PR) curves and confusion matrices.
results.csv is updated after each epoch, and then plotted as
results.png (below) after training completes. You can also plot any
results.csv file manually:
Once your model is trained you can use your best checkpoint
- Run CLI or Python inference on new images and videos
- Validate accuracy on train, val and test splits
- Export to TensorFlow, Keras, ONNX, TFlite, TF.js, CoreML and TensorRT formats
- Evolve hyperparameters to improve performance
- Improve your model by sampling real-world images and adding them to your dataset
- Free GPU Notebooks:
- Google Cloud: GCP Quickstart Guide
- Amazon: AWS Quickstart Guide
- Azure: AzureML Quickstart Guide
- Docker: Docker Quickstart Guide
This badge indicates that all YOLOv5 GitHub Actions Continuous Integration (CI) tests are successfully passing. These CI tests rigorously check the functionality and performance of YOLOv5 across various key aspects: training, validation, inference, export, and benchmarks. They ensure consistent and reliable operation on macOS, Windows, and Ubuntu, with tests conducted every 24 hours and upon each new commit.
Created 2023-11-12, Updated 2024-01-21
Authors: glenn-jocher (11)