Reference for ultralytics/data/utils.py
Note
Full source code for this file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/data/utils.py. Help us fix any issues you see by submitting a Pull Request 🛠️. Thank you 🙏!
ultralytics.data.utils.HUBDatasetStats
A class for generating HUB dataset JSON and -hub
dataset directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path to data.yaml or data.zip (with data.yaml inside data.zip). Default is 'coco128.yaml'. |
'coco128.yaml'
|
task |
str
|
Dataset task. Options are 'detect', 'segment', 'pose', 'classify'. Default is 'detect'. |
'detect'
|
autodownload |
bool
|
Attempt to download dataset if not found locally. Default is False. |
False
|
Example
Download *.zip files from https://github.com/ultralytics/hub/tree/main/example_datasets i.e. https://github.com/ultralytics/hub/raw/main/example_datasets/coco8.zip for coco8.zip.
from ultralytics.data.utils import HUBDatasetStats
stats = HUBDatasetStats('path/to/coco8.zip', task='detect') # detect dataset
stats = HUBDatasetStats('path/to/coco8-seg.zip', task='segment') # segment dataset
stats = HUBDatasetStats('path/to/coco8-pose.zip', task='pose') # pose dataset
stats = HUBDatasetStats('path/to/imagenet10.zip', task='classify') # classification dataset
stats.get_json(save=True)
stats.process_images()
Source code in ultralytics/data/utils.py
391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 |
|
__init__(path='coco128.yaml', task='detect', autodownload=False)
Initialize class.
Source code in ultralytics/data/utils.py
get_json(save=False, verbose=False)
Return dataset JSON for Ultralytics HUB.
Source code in ultralytics/data/utils.py
455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 |
|
process_images()
Compress images for Ultralytics HUB.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.img2label_paths(img_paths)
Define label paths as a function of image paths.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.get_hash(paths)
Returns a single hash value of a list of paths (files or dirs).
Source code in ultralytics/data/utils.py
ultralytics.data.utils.exif_size(img)
Returns exif-corrected PIL size.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.verify_image(args)
Verify one image.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.verify_image_label(args)
Verify one image-label pair.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.polygon2mask(imgsz, polygons, color=1, downsample_ratio=1)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
imgsz |
tuple
|
The image size. |
required |
polygons |
list[ndarray]
|
[N, M], N is the number of polygons, M is the number of points(Be divided by 2). |
required |
color |
int
|
color |
1
|
downsample_ratio |
int
|
downsample ratio |
1
|
Source code in ultralytics/data/utils.py
ultralytics.data.utils.polygons2masks(imgsz, polygons, color, downsample_ratio=1)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
imgsz |
tuple
|
The image size. |
required |
polygons |
list[ndarray]
|
each polygon is [N, M], N is number of polygons, M is number of points (M % 2 = 0) |
required |
color |
int
|
color |
required |
downsample_ratio |
int
|
downsample ratio |
1
|
Source code in ultralytics/data/utils.py
ultralytics.data.utils.polygons2masks_overlap(imgsz, segments, downsample_ratio=1)
Return a (640, 640) overlap mask.
Source code in ultralytics/data/utils.py
ultralytics.data.utils.find_dataset_yaml(path)
Find and return the YAML file associated with a Detect, Segment or Pose dataset.
This function searches for a YAML file at the root level of the provided directory first, and if not found, it performs a recursive search. It prefers YAML files that have the samestem as the provided path. An AssertionError is raised if no YAML file is found or if multiple YAML files are found.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The directory path to search for the YAML file. |
required |
Returns:
Type | Description |
---|---|
Path
|
The path of the found YAML file. |
Source code in ultralytics/data/utils.py
ultralytics.data.utils.check_det_dataset(dataset, autodownload=True)
Download, verify, and/or unzip a dataset if not found locally.
This function checks the availability of a specified dataset, and if not found, it has the option to download and unzip the dataset. It then reads and parses the accompanying YAML data, ensuring key requirements are met and also resolves paths related to the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
Path to the dataset or dataset descriptor (like a YAML file). |
required |
autodownload |
bool
|
Whether to automatically download the dataset if not found. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
dict
|
Parsed dataset information and paths. |
Source code in ultralytics/data/utils.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 |
|
ultralytics.data.utils.check_cls_dataset(dataset, split='')
Checks a classification dataset such as Imagenet.
This function accepts a dataset
name and attempts to retrieve the corresponding dataset information.
If the dataset is not found locally, it attempts to download the dataset from the internet and save it locally.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str | Path
|
The name of the dataset. |
required |
split |
str
|
The split of the dataset. Either 'val', 'test', or ''. Defaults to ''. |
''
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing the following keys: - 'train' (Path): The directory path containing the training set of the dataset. - 'val' (Path): The directory path containing the validation set of the dataset. - 'test' (Path): The directory path containing the test set of the dataset. - 'nc' (int): The number of classes in the dataset. - 'names' (dict): A dictionary of class names in the dataset. |
Source code in ultralytics/data/utils.py
ultralytics.data.utils.compress_one_image(f, f_new=None, max_dim=1920, quality=50)
Compresses a single image file to reduced size while preserving its aspect ratio and quality using either the Python Imaging Library (PIL) or OpenCV library. If the input image is smaller than the maximum dimension, it will not be resized.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
f |
str
|
The path to the input image file. |
required |
f_new |
str
|
The path to the output image file. If not specified, the input file will be overwritten. |
None
|
max_dim |
int
|
The maximum dimension (width or height) of the output image. Default is 1920 pixels. |
1920
|
quality |
int
|
The image compression quality as a percentage. Default is 50%. |
50
|
Example
Source code in ultralytics/data/utils.py
ultralytics.data.utils.autosplit(path=DATASETS_DIR / 'coco8/images', weights=(0.9, 0.1, 0.0), annotated_only=False)
Automatically split a dataset into train/val/test splits and save the resulting splits into autosplit_*.txt files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
Path to images directory. Defaults to DATASETS_DIR / 'coco8/images'. |
DATASETS_DIR / 'coco8/images'
|
weights |
list | tuple
|
Train, validation, and test split fractions. Defaults to (0.9, 0.1, 0.0). |
(0.9, 0.1, 0.0)
|
annotated_only |
bool
|
If True, only images with an associated txt file are used. Defaults to False. |
False
|