
Data Preprocessing Techniques for Annotated Computer Vision Data


定义计算机视觉 项目的目标收集和批注数据后,下一步是预处理批注数据,并为模型训练做好准备。干净一致的数据对于创建性能良好的模型至关重要。

预处理是 计算机视觉项目工作流 中的一个步骤,包括调整图像大小、规范化像素值、扩充数据集以及将数据拆分为训练集、验证集和测试集。让我们来探讨清理数据的基本技术和最佳实践!



  • 噪声:数据中不相关或随机的变化。
  • 不一致:图像大小、格式和质量的变化。
  • 不平衡:数据集中类或类别的分布不均。





  • 双线性插值:通过对四个最接近的像素值进行加权平均来平滑像素值。
  • 最近邻:在不平均的情况下分配最接近的像素值,从而产生块状图像,但计算速度更快。


  • OpenCV: A popular computer vision library with extensive functions for image processing.
  • PIL(枕头):A Python 用于打开、操作和保存图像文件的映像库。

With respect to YOLO11, the 'imgsz' parameter during model training allows for flexible input sizes. When set to a specific size, such as 640, the model will resize input images so their largest dimension is 640 pixels while maintaining the original aspect ratio.




  • 最小-最大缩放:将像素值缩放到 0 到 1 的范围内。
  • Z 分数归一化:根据像素值的平均值和标准差缩放像素值。

With respect to YOLO11, normalization is seamlessly handled as part of its preprocessing pipeline during model training. YOLO11 automatically performs several preprocessing steps, including conversion to RGB, scaling pixel values to the range [0, 1], and normalization using predefined mean and standard deviation values.


清理数据后,即可拆分数据集。将数据拆分为训练集、验证集和测试集,以确保可以在看不见的数据上评估模型,以评估其泛化性能。常见的拆分是 70% 用于训练,20% 用于验证,10% 用于测试。您可以使用各种工具和库来拆分数据,例如 scikit-learn 或 TensorFlow.


  • 维护数据分布:确保在训练集、验证集和测试集之间维护类的数据分布。
  • 避免数据泄露:通常,数据增强是在数据集拆分后完成的。数据增强和任何其他预处理应仅应用于训练集,以防止来自验证或测试集的信息影响模型训练。- 平衡类:对于不平衡的数据集,请考虑在训练集中对少数类进行过采样或对多数类进行欠采样等技术。




  • 创建更强大的数据集:数据增强可以使模型对输入数据中的变化和失真更加鲁棒。这包括照明、方向和比例的变化。
  • Cost-Effective: Data augmentation is a cost-effective way to increase the amount of training data without collecting and labeling new data.
  • 更好地利用数据:通过创建新的变体,每个可用的数据点都发挥其最大潜力


常见的增强技术包括翻转、旋转、缩放和颜色调整。多个库,例如 Albumentations、Imgaug 和 TensorFlow的 ImageDataGenerator,可以生成这些增强。


With respect to YOLO11, you can augment your custom dataset by modifying the dataset configuration file, a .yaml file. In this file, you can add an augmentation section with parameters that specify how you want to augment your data.

The Ultralytics YOLO11 repository supports a wide range of data augmentations. You can apply various transformations such as:

  • 随机作物
  • 翻转:图像可以水平或垂直翻转。
  • 旋转:图像可以按特定角度旋转。
  • 失真



Consider a project aimed at developing a model to detect and classify different types of vehicles in traffic images using YOLO11. We've collected traffic images and annotated them with bounding boxes and labels.


  • Resizing Images: Since YOLO11 handles flexible input sizes and performs resizing automatically, manual resizing is not required. The model will adjust the image size according to the specified 'imgsz' parameter during training.
  • Normalizing Pixel Values: YOLO11 automatically normalizes pixel values to a range of 0 to 1 during preprocessing, so it's not required.
  • 拆分数据集:使用 scikit-learn 等工具将数据集分为训练 (70%)、验证 (20%) 和测试 (10%) 集。
  • Data Augmentation: Modify the dataset configuration file (.yaml) to include data augmentation techniques such as random crops, horizontal flips, and brightness adjustments.

这些步骤可确保数据集已准备好,没有任何潜在问题,并已准备好进行探索性数据分析 (EDA)。


在对数据集进行预处理和扩充后,下一步是通过探索性数据分析获得见解。EDA 使用统计技术和可视化工具来了解数据中的模式和分布。您可以识别类不平衡或异常值等问题,并就进一步的数据预处理或模型训练调整做出明智的决策。




可视化是图像数据集 EDA 的关键。例如,类不平衡分析是 EDA 的另一个重要方面。它有助于确定某些类在数据集中的代表性是否不足,使用条形图可视化不同图像类或类别的分布可以快速揭示任何不平衡。同样,可以使用箱形图等可视化工具识别异常值,这些工具突出显示了像素强度或特征分布的异常。异常值检测可防止异常数据点扭曲结果。


  • 直方图和箱形图:有助于了解像素值的分布和识别异常值。
  • 散点图:有助于探索影像要素或注释之间的关系。
  • 热图:可有效可视化图像中像素强度的分布或带注释特征的空间分布。

用 Ultralytics 适用于 EDA 的 Explorer

As of ultralytics>=8.3.10, Ultralytics explorer support has been deprecated. But don't worry! You can now access similar and even enhanced functionality through Ultralytics 枢纽, our intuitive no-code platform designed to streamline your workflow. With Ultralytics HUB, you can continue exploring, visualizing, and managing your data effortlessly, all without writing a single line of code. Make sure to check it out and take advantage of its powerful features!🚀

For a more advanced approach to EDA, you can use the Ultralytics Explorer tool. It offers robust capabilities for exploring computer vision datasets. By supporting semantic search, SQL queries, and vector similarity search, the tool makes it easy to analyze and understand your data. With Ultralytics Explorer, you can create embeddings for your dataset to find similar images, run SQL queries for detailed analysis, and perform semantic searches, all through a user-friendly graphical interface.

概述 Ultralytics 资源管理器




如何使用Ultralytics YOLO 进行数据扩增?

For data augmentation with Ultralytics YOLO11, you need to modify the dataset configuration file (.yaml). In this file, you can specify various augmentation techniques such as random crops, horizontal flips, and brightness adjustments. This can be effectively done using the training configurations explained here. Data augmentation helps create a more robust dataset, reduce overfitting, and improve model generalization.



  • 最小-最大缩放:将像素值缩放到 0 到 1 的范围内。
  • Z 分数归一化:根据像素值的平均值和标准差缩放像素值。

For YOLO11, normalization is handled automatically, including conversion to RGB and pixel value scaling. Learn more about it in the model training section.


To split your dataset, a common practice is to divide it into 70% for training, 20% for validation, and 10% for testing. It is important to maintain the data distribution of classes across these splits and avoid data leakage by performing augmentation only on the training set. Use tools like scikit-learn or TensorFlow for efficient dataset splitting. See the detailed guide on dataset preparation.

Can I handle varying image sizes in YOLO11 without manual resizing?

Yes, Ultralytics YOLO11 can handle varying image sizes through the 'imgsz' parameter during model training. This parameter ensures that images are resized so their largest dimension matches the specified size (e.g., 640 pixels), while maintaining the aspect ratio. For more flexible input handling and automatic adjustments, check the model training section.

