Reference for ultralytics/data/utils.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/data/utils.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!
ultralytics.data.utils.HUBDatasetStats
A class for generating HUB dataset JSON and -hub
dataset directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path to data.yaml or data.zip (with data.yaml inside data.zip). Default is 'coco8.yaml'. | 'coco8.yaml' |
task | str | Dataset task. Options are 'detect', 'segment', 'pose', 'classify'. Default is 'detect'. | 'detect' |
autodownload | bool | Attempt to download dataset if not found locally. Default is False. | False |
Example
Download *.zip files from https://github.com/ultralytics/hub/tree/main/example_datasets i.e. https://github.com/ultralytics/hub/raw/main/example_datasets/coco8.zip for coco8.zip.
from ultralytics.data.utils import HUBDatasetStats
stats = HUBDatasetStats("path/to/coco8.zip", task="detect") # detect dataset
stats = HUBDatasetStats("path/to/coco8-seg.zip", task="segment") # segment dataset
stats = HUBDatasetStats("path/to/coco8-pose.zip", task="pose") # pose dataset
stats = HUBDatasetStats("path/to/dota8.zip", task="obb") # OBB dataset
stats = HUBDatasetStats("path/to/imagenet10.zip", task="classify") # classification dataset
stats.get_json(save=True)
stats.process_images()
Source code in ultralytics/data/utils.py
get_json
Return dataset JSON for Ultralytics HUB.
Source code in ultralytics/data/utils.py
process_images
Compress images for Ultralytics HUB.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.img2label_paths
Define label paths as a function of image paths.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.get_hash
Returns a single hash value of a list of paths (files or dirs).
Source code in ultralytics/data/utils.py
ultralytics.data.utils.exif_size
Returns exif-corrected PIL size.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.verify_image
Verify one image.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.verify_image_label
Verify one image-label pair.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.polygon2mask
Convert a list of polygons to a binary mask of the specified image size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
imgsz | tuple | The size of the image as (height, width). | required |
polygons | list[ndarray] | A list of polygons. Each polygon is an array with shape [N, M], where N is the number of polygons, and M is the number of points such that M % 2 = 0. | required |
color | int | The color value to fill in the polygons on the mask. Defaults to 1. | 1 |
downsample_ratio | int | Factor by which to downsample the mask. Defaults to 1. | 1 |
Returns:
Type | Description |
---|---|
ndarray | A binary mask of the specified image size with the polygons filled in. |
Source code in ultralytics/data/utils.py
ultralytics.data.utils.polygons2masks
Convert a list of polygons to a set of binary masks of the specified image size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
imgsz | tuple | The size of the image as (height, width). | required |
polygons | list[ndarray] | A list of polygons. Each polygon is an array with shape [N, M], where N is the number of polygons, and M is the number of points such that M % 2 = 0. | required |
color | int | The color value to fill in the polygons on the masks. | required |
downsample_ratio | int | Factor by which to downsample each mask. Defaults to 1. | 1 |
Returns:
Type | Description |
---|---|
ndarray | A set of binary masks of the specified image size with the polygons filled in. |
Source code in ultralytics/data/utils.py
ultralytics.data.utils.polygons2masks_overlap
Return a (640, 640) overlap mask.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.find_dataset_yaml
Find and return the YAML file associated with a Detect, Segment or Pose dataset.
This function searches for a YAML file at the root level of the provided directory first, and if not found, it performs a recursive search. It prefers YAML files that have the same stem as the provided path. An AssertionError is raised if no YAML file is found or if multiple YAML files are found.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | Path | The directory path to search for the YAML file. | required |
Returns:
Type | Description |
---|---|
Path | The path of the found YAML file. |
Source code in ultralytics/data/utils.py
ultralytics.data.utils.check_det_dataset
Download, verify, and/or unzip a dataset if not found locally.
This function checks the availability of a specified dataset, and if not found, it has the option to download and unzip the dataset. It then reads and parses the accompanying YAML data, ensuring key requirements are met and also resolves paths related to the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset | str | Path to the dataset or dataset descriptor (like a YAML file). | required |
autodownload | bool | Whether to automatically download the dataset if not found. Defaults to True. | True |
Returns:
Type | Description |
---|---|
dict | Parsed dataset information and paths. |
Source code in ultralytics/data/utils.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 |
|
ultralytics.data.utils.check_cls_dataset
Checks a classification dataset such as Imagenet.
This function accepts a dataset
name and attempts to retrieve the corresponding dataset information. If the dataset is not found locally, it attempts to download the dataset from the internet and save it locally.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset | str | Path | The name of the dataset. | required |
split | str | The split of the dataset. Either 'val', 'test', or ''. Defaults to ''. | '' |
Returns:
Type | Description |
---|---|
dict | A dictionary containing the following keys: - 'train' (Path): The directory path containing the training set of the dataset. - 'val' (Path): The directory path containing the validation set of the dataset. - 'test' (Path): The directory path containing the test set of the dataset. - 'nc' (int): The number of classes in the dataset. - 'names' (dict): A dictionary of class names in the dataset. |
Source code in ultralytics/data/utils.py
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 |
|
ultralytics.data.utils.compress_one_image
Compresses a single image file to reduced size while preserving its aspect ratio and quality using either the Python Imaging Library (PIL) or OpenCV library. If the input image is smaller than the maximum dimension, it will not be resized.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
f | str | The path to the input image file. | required |
f_new | str | The path to the output image file. If not specified, the input file will be overwritten. | None |
max_dim | int | The maximum dimension (width or height) of the output image. Default is 1920 pixels. | 1920 |
quality | int | The image compression quality as a percentage. Default is 50%. | 50 |
Example
Source code in ultralytics/data/utils.py
ultralytics.data.utils.autosplit
Automatically split a dataset into train/val/test splits and save the resulting splits into autosplit_*.txt files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | Path | Path to images directory. Defaults to DATASETS_DIR / 'coco8/images'. | DATASETS_DIR / 'coco8/images' |
weights | list | tuple | Train, validation, and test split fractions. Defaults to (0.9, 0.1, 0.0). | (0.9, 0.1, 0.0) |
annotated_only | bool | If True, only images with an associated txt file are used. Defaults to False. | False |
Source code in ultralytics/data/utils.py
ultralytics.data.utils.load_dataset_cache_file
Load an Ultralytics *.cache dictionary from path.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.save_dataset_cache_file
Save an Ultralytics dataset *.cache dictionary x to path.