Reference for ultralytics/data/split.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/data/split.py. If you spot a problem please help fix it by contributing a Pull Request ๐ ๏ธ. Thank you ๐!
ultralytics.data.split.split_classify_dataset
split_classify_dataset(source_dir, train_ratio=0.8)
Split dataset into train and val directories in a new directory.
Creates a new directory '{source_dir}_split' with train/val subdirectories, preserving the original class structure with an 80/20 split by default.
Directory structure
Before: caltech/ โโโ class1/ โ โโโ img1.jpg โ โโโ img2.jpg โ โโโ ... โโโ class2/ โ โโโ img1.jpg โ โโโ ... โโโ ...
After: caltech_split/ โโโ train/ โ โโโ class1/ โ โ โโโ img1.jpg โ โ โโโ ... โ โโโ class2/ โ โ โโโ img1.jpg โ โ โโโ ... โ โโโ ... โโโ val/ โโโ class1/ โ โโโ img2.jpg โ โโโ ... โโโ class2/ โ โโโ ... โโโ ...
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_dir
|
str | Path
|
Path to Caltech dataset root directory. |
required |
train_ratio
|
float
|
Ratio for train split, between 0 and 1. |
0.8
|
Examples:
>>> # Split dataset with default 80/20 ratio
>>> split_classify_dataset("path/to/caltech")
>>> # Split with custom ratio
>>> split_classify_dataset("path/to/caltech", 0.75)
Source code in ultralytics/data/split.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
ultralytics.data.split.autosplit
autosplit(
path=DATASETS_DIR / "coco8/images",
weights=(0.9, 0.1, 0.0),
annotated_only=False,
)
Automatically split a dataset into train/val/test splits and save the resulting splits into autosplit_*.txt files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
Path to images directory. |
DATASETS_DIR / 'coco8/images'
|
weights
|
list | tuple
|
Train, validation, and test split fractions. |
(0.9, 0.1, 0.0)
|
annotated_only
|
bool
|
If True, only images with an associated txt file are used. |
False
|
Examples:
>>> from ultralytics.data.split import autosplit
>>> autosplit()
Source code in ultralytics/data/split.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|