Kãã©ãŒã«ãã»ã¯ãã¹ã»ããªããŒã·ã§ã³Ultralytics
ã¯ããã«
ãã®å æ¬çãªã¬ã€ãã§ã¯ãUltralytics ãšã³ã·ã¹ãã å ã®ãªããžã§ã¯ãæ€åºããŒã¿ã»ããã«å¯Ÿãã K-Fold Cross Validation ã®å®è£ ã«ã€ããŠèª¬æããŸããYOLO æ€åºãã©ãŒããããšãsklearnãpandasãPyYaml ãªã©ã®äž»èŠãªPython ã©ã€ãã©ãªã掻çšããŠãå¿ èŠãªã»ããã¢ãããç¹åŸŽãã¯ãã«ã®çæããã»ã¹ãK-Fold ããŒã¿ã»ããåå²ã®å®è¡ãã¬ã€ãããŸãã
ãã®ãã¥ãŒããªã¢ã«ã§ã¯ãFruit Detectionã®ããŒã¿ã»ããã䜿çšãããããžã§ã¯ãã§ãããã«ã¹ã¿ã ããŒã¿ãœãŒã¹ã䜿çšãããããžã§ã¯ãã§ãããK-Foldã¯ãã¹ããªããŒã·ã§ã³ãç解ããé©çšããŠãä¿¡é Œæ§ãšå
ç¢æ§ã匷åããããšãç®çãšããŠããŸãã æ©æ¢°åŠç¿ ã¢ãã«ã䜿çšããŠãããç§ãã¡ã¯ k=5
ãã®ãã¥ãŒããªã¢ã«ã§ã¯ãæé©ãªåæ°ã¯ããŒã¿ã»ããããããžã§ã¯ãã®ä»æ§ã«ãã£ãŠç°ãªãããšã念é ã«çœ®ããŠããŸãã
ã§ã¯ããã£ããèŠãŠãããïŒ
ã»ããã¢ãã
-
泚éã¯ãYOLO æ€åºãã©ãŒãããã§ãªããã°ãªããŸããã
-
ãã®ã¬ã€ãã§ã¯ã泚éãã¡ã€ã«ãããŒã«ã«ã«ããããšãåæãšããŠããŸãã
-
ãã®ãã¢ã§ã¯ãFruit DetectionããŒã¿ã»ããã䜿çšããã
- ãã®ããŒã¿ã»ããã«ã¯åèš8479æã®ç»åãå«ãŸããŠããã
- ããã¯6ã€ã®ã¯ã©ã¹ã»ã©ãã«ãå«ã¿ãããããã®ã€ã³ã¹ã¿ã³ã¹ç·æ°ã¯ä»¥äžã®éãã§ããã
ã¯ã©ã¹ã©ãã« | ã€ã³ã¹ã¿ã³ã¹æ° |
---|---|
ã¢ããã« | 7049 |
ãã㊠| 7202 |
ãã€ãããã« | 1613 |
ãªã¬ã³ãž | 15549 |
ããã | 3536 |
ã¹ã€ã« | 1976 |
-
å¿ èŠãªPython ããã±ãŒãžã¯ä»¥äžã®éãïŒ
ultralytics
sklearn
pandas
pyyaml
-
ãã®ãã¥ãŒããªã¢ã«ã§ã¯
k=5
åã§ãããããããç¹å®ã®ããŒã¿ã»ããã«æé©ãªãã©ãŒã«ãæ°ã決å®ããå¿ èŠãããã -
æ°ããPython ä»®æ³ç°å¢ (
venv
)ããããžã§ã¯ãã«è¿œå ããã¢ã¯ãã£ããŒãããã䜿çšæ¹æ³pip
(ãŸãã¯ã奜ã¿ã®ããã±ãŒãžã»ãããŒãžã£ãŒïŒã§ã€ã³ã¹ããŒã«ããïŒ- Ultralytics ã©ã€ãã©ãªãŒïŒ
pip install -U ultralytics
.ãŸãã¯ãå ¬åŒã® ã¬ã. - Scikit-learnãpandasãPyYAMLïŒ
pip install -U scikit-learn pandas pyyaml
.
- Ultralytics ã©ã€ãã©ãªãŒïŒ
-
泚éãYOLO æ€åºãã©ãŒãããã§ããããšã確èªããŠãã ããã
- ãã®ãã¥ãŒããªã¢ã«ã§ã¯ããã¹ãŠã®æ³šéãã¡ã€ã«ã¯
Fruit-Detection/labels
ãã£ã¬ã¯ããªã«ããã
- ãã®ãã¥ãŒããªã¢ã«ã§ã¯ããã¹ãŠã®æ³šéãã¡ã€ã«ã¯
ç©äœæ€åºããŒã¿ã»ããã®ç¹åŸŽãã¯ãã«ã®çæ
-
æ°ãã
example.py
Python ãã¡ã€ã«ãäœæããã -
ããŒã¿ã»ããã®ãã¹ãŠã®ã©ãã«ãã¡ã€ã«ãååŸããã
-
ããŠãããŒã¿ã»ããã®YAMLãã¡ã€ã«ã®äžèº«ãèªãã§ãã¯ã©ã¹ã©ãã«ã®ã€ã³ããã¯ã¹ãæœåºããŸãããã
-
空ã®
pandas
ããŒã¿ãã¬ãŒã ã -
泚éãã¡ã€ã«ã«ååšããåã¯ã©ã¹ã©ãã«ã®ã€ã³ã¹ã¿ã³ã¹ãæ°ããã
from collections import Counter for label in labels: lbl_counter = Counter() with open(label, "r") as lf: lines = lf.readlines() for line in lines: # classes for YOLO label uses integer at first position of each line lbl_counter[int(line.split(" ")[0])] += 1 labels_df.loc[label.stem] = lbl_counter labels_df = labels_df.fillna(0.0) # replace `nan` values with `0.0`
-
以äžã¯ãå ¥åãããDataFrameã®ãµã³ãã«ãã¥ãŒã§ãïŒ
0 1 2 3 4 5 '0000a16e4b057580_jpg.rf.00ab48988370f64f5ca8ea4...' 0.0 0.0 0.0 0.0 0.0 7.0 '0000a16e4b057580_jpg.rf.7e6dce029fb67f01eb19aa7...' 0.0 0.0 0.0 0.0 0.0 7.0 '0000a16e4b057580_jpg.rf.bc4d31cdcbe229dd022957a...' 0.0 0.0 0.0 0.0 0.0 7.0 '00020ebf74c4881c_jpg.rf.508192a0a97aa6c4a3b6882...' 0.0 0.0 0.0 1.0 0.0 0.0 '00020ebf74c4881c_jpg.rf.5af192a2254c8ecc4188a25...' 0.0 0.0 0.0 1.0 0.0 0.0 ... ... ... ... ... ... ... 'ff4cd45896de38be_jpg.rf.c4b5e967ca10c7ced3b9e97...' 0.0 0.0 0.0 0.0 0.0 2.0 'ff4cd45896de38be_jpg.rf.ea4c1d37d2884b3e3cbce08...' 0.0 0.0 0.0 0.0 0.0 2.0 'ff5fd9c3c624b7dc_jpg.rf.bb519feaa36fc4bf630a033...' 1.0 0.0 0.0 0.0 0.0 0.0 'ff5fd9c3c624b7dc_jpg.rf.f0751c9c3aa4519ea3c9d6a...' 1.0 0.0 0.0 0.0 0.0 0.0 'fffe28b31f2a70d4_jpg.rf.7ea16bd637ba0711c53b540...' 0.0 6.0 0.0 0.0 0.0 0.0
è¡ã¯ã©ãã«ãã¡ã€ã«ã®ã€ã³ããã¯ã¹ãè¡šãããããããããŒã¿ã»ããäžã®ç»åã«å¯Ÿå¿ããåã¯ã¯ã©ã¹ã©ãã«ã®ã€ã³ããã¯ã¹ã«å¯Ÿå¿ãããåè¡ã¯æ¬äŒŒçãªç¹åŸŽãã¯ãã«ãè¡šããããŒã¿ã»ããã«ååšããåã¯ã©ã¹ã©ãã«ã®ã«ãŠã³ããè¡šãããã®ããŒã¿æ§é ã«ãããç©äœæ€åºããŒã¿ã»ããã«K-Foldã¯ãã¹ããªããŒã·ã§ã³ãé©çšããããšãã§ããã
Kãã©ãŒã«ãã»ããŒã¿ã»ããåå²
-
ã§ã¯
KFold
ã¯ã©ã¹sklearn.model_selection
ãçæãããk
ããŒã¿ã»ããã®åå²ã- éèŠã ïŒ
- ã»ããã£ã³ã°
shuffle=True
ã¯ãåå²ã«ãããã¯ã©ã¹ã®ã©ã³ãã ãªååžãä¿èšŒããŸãã - ã»ããã£ã³ã°
random_state=M
ã©ãM
ãæŽæ°ã«ããã°ãåçŸæ§ã®ããçµæãåŸãããšãã§ããã
- ã»ããã£ã³ã°
- éèŠã ïŒ
-
ããŒã¿ã»ããã¯çŸåšã以äžã®ããã«åå²ãããŠããã
k
ã®ãªã¹ããæã€ãtrain
ãããŠval
ã€ã³ããã¯ã¹ã衚瀺ããŸãããããã®çµæãããæ確ã«è¡šç€ºããããã«ãDataFrameãäœæããã -
ã§ã¯ãåãã©ãŒã«ãã®ã¯ã©ã¹ã»ã©ãã«ã®ååžãã次ã®ããã«èšç®ããã
val
ã«åºåžãããtrain
.fold_lbl_distrb = pd.DataFrame(index=folds, columns=cls_idx) for n, (train_indices, val_indices) in enumerate(kfolds, start=1): train_totals = labels_df.iloc[train_indices].sum() val_totals = labels_df.iloc[val_indices].sum() # To avoid division by zero, we add a small value (1E-7) to the denominator ratio = val_totals / (train_totals + 1e-7) fold_lbl_distrb.loc[f"split_{n}"] = ratio
çæ³çãªã·ããªãªã¯ããã¹ãŠã®ã¯ã©ã¹ã®æ¯çããåã¹ããªããã§ããŸãã¯ã©ã¹éã§ãé©åºŠã«äŒŒãŠããããšã§ããããããããã¯ããŒã¿ã»ããã®ä»æ§ã«äŸåããŸãã
-
次ã«ãåã¹ããªããã®ãã£ã¬ã¯ããªãšããŒã¿ã»ããã®YAMLãã¡ã€ã«ãäœæããŸãã
import datetime supported_extensions = [".jpg", ".jpeg", ".png"] # Initialize an empty list to store image file paths images = [] # Loop through supported extensions and gather image files for ext in supported_extensions: images.extend(sorted((dataset_path / "images").rglob(f"*{ext}"))) # Create the necessary directories and dataset YAML files (unchanged) save_path = Path(dataset_path / f"{datetime.date.today().isoformat()}_{ksplit}-Fold_Cross-val") save_path.mkdir(parents=True, exist_ok=True) ds_yamls = [] for split in folds_df.columns: # Create directories split_dir = save_path / split split_dir.mkdir(parents=True, exist_ok=True) (split_dir / "train" / "images").mkdir(parents=True, exist_ok=True) (split_dir / "train" / "labels").mkdir(parents=True, exist_ok=True) (split_dir / "val" / "images").mkdir(parents=True, exist_ok=True) (split_dir / "val" / "labels").mkdir(parents=True, exist_ok=True) # Create dataset YAML files dataset_yaml = split_dir / f"{split}_dataset.yaml" ds_yamls.append(dataset_yaml) with open(dataset_yaml, "w") as ds_y: yaml.safe_dump( { "path": split_dir.as_posix(), "train": "train", "val": "val", "names": classes, }, ds_y, )
-
æåŸã«ãç»åãšã©ãã«ãåã¹ããªããã®ãã£ã¬ã¯ããªïŒ'train'ãŸãã¯'val'ïŒã«ã³ããŒããã
- 泚ïŒã³ãŒãã®ãã®éšåã«èŠããæéã¯ãããŒã¿ã»ããã®ãµã€ãºãšã·ã¹ãã ã®ããŒããŠã§ã¢ã«ãã£ãŠç°ãªããŸãã
import shutil for image, label in zip(images, labels): for split, k_split in folds_df.loc[image.stem].items(): # Destination directory img_to_path = save_path / split / k_split / "images" lbl_to_path = save_path / split / k_split / "labels" # Copy image and label files to new directory (SamefileError if file already exists) shutil.copy(image, img_to_path / image.name) shutil.copy(label, lbl_to_path / label.name)
ã¬ã³ãŒãã®ä¿åïŒãªãã·ã§ã³ïŒ
ãªãã·ã§ã³ã§ãK-Foldåå²ãšã©ãã«é åžã®DataFrameã®ã¬ã³ãŒããCSVãã¡ã€ã«ãšããŠä¿åããåŸã§åç §ããããšãã§ããŸãã
folds_df.to_csv(save_path / "kfold_datasplit.csv")
fold_lbl_distrb.to_csv(save_path / "kfold_label_distribution.csv")
K-Fold ããŒã¿åå²ã䜿çšãããã¬ãŒãã³ã°YOLO
-
ãŸããYOLO ã®ã¢ãã«ãããŒãããã
-
次ã«ãããŒã¿ã»ããã®YAMLãã¡ã€ã«ãå埩åŠçããŠãã¬ãŒãã³ã°ãå®è¡ããŸããçµæã¯
project
ãããŠname
åŒæ°ã§æå®ãããããã©ã«ãã§ã¯ããã®ãã£ã¬ã¯ããªã¯'exp/runs#'ã§ããã#ã¯æŽæ°ã€ã³ããã¯ã¹ã§ãããresults = {} # Define your additional arguments here batch = 16 project = "kfold_demo" epochs = 100 for k in range(ksplit): dataset_yaml = ds_yamls[k] model = YOLO(weights_path, task="detect") model.train(data=dataset_yaml, epochs=epochs, batch=batch, project=project) # include any train arguments results[k] = model.metrics # save output metrics for further analysis
çµè«
ãã®ã¬ã€ãã§ã¯ãYOLO ãªããžã§ã¯ãæ€åºã¢ãã«ã®ãã¬ãŒãã³ã°ã« K-Fold 亀差æ€èšŒã䜿çšããããã»ã¹ãæ¢ã£ããããŒã¿ã»ãããKåã®ããŒãã£ã·ã§ã³ã«åå²ããç°ãªããã©ãŒã«ãéã§ãã©ã³ã¹ã®ãšããã¯ã©ã¹ååžã確ä¿ããæ¹æ³ãåŠã³ãŸããã
ãŸããã¬ããŒãDataFramesãäœæããããŒã¿ã®åå²ãšåå²ãããã©ãã«ååžãèŠèŠåããæé ãæ€èšããã
ããã¯ã倧èŠæš¡ãªãããžã§ã¯ãããã¢ãã«ã®æ§èœããã©ãã«ã·ã¥ãŒãã£ã³ã°ãããšãã«ç¹ã«åœ¹ç«ã€ã
æåŸã«ãåã¹ããªããã䜿çšããå®éã®ã¢ãã«åŠç¿ãã«ãŒãã§å®è¡ãããããªãåæãšæ¯èŒã®ããã«åŠç¿çµæãä¿åããã
ãã®K-Foldã¯ãã¹ããªããŒã·ã§ã³ã®ãã¯ããã¯ã¯ãå©çšå¯èœãªããŒã¿ãæ倧éã«æŽ»çšããããã¹ããªæ¹æ³ã§ãããã¢ãã«ã®ããã©ãŒãã³ã¹ãä¿¡é Œã§ããç°ãªãããŒã¿ã»ãµãã»ããéã§äžè²«ããŠããããšãä¿èšŒããã®ã«åœ¹ç«ã¡ãŸãããã®çµæãç¹å®ã®ããŒã¿ã»ãã¿ãŒã³ã«éå°é©åããå¯èœæ§ãäœããããäžè¬åå¯èœã§ä¿¡é Œæ§ã®é«ãã¢ãã«ãåºæ¥äžãããŸãã
ãã®ã¬ã€ãã§ã¯YOLO ã䜿çšãããããããã®ã¹ãããã¯ä»ã®æ©æ¢°åŠç¿ã¢ãã«ã«ãã»ãšãã©é©çšã§ããããšãå¿ããªãã§ã»ããããããã®ã¹ããããç解ããããšã§ãããªãèªèº«ã®æ©æ¢°åŠç¿ãããžã§ã¯ãã§ã¯ãã¹ããªããŒã·ã§ã³ãå¹æçã«é©çšããããšãã§ããŸããããããŒã»ã³ãŒãã£ã³ã°ïŒ
ããããã質å
K-Foldã¯ãã¹ããªããŒã·ã§ã³ãšã¯äœãããªãç©äœæ€åºã«æçšãªã®ãïŒ
Kãã©ãŒã«ã亀差æ€èšŒã¯ãã¢ãã«ã®æ§èœããã確å®ã«è©äŸ¡ããããã«ãããŒã¿ã»ããããkãåã®ãµãã»ããïŒãã©ãŒã«ãïŒã«åå²ããææ³ã§ãããããããã®ãã©ãŒã«ãã¯ãã¬ãŒãã³ã°ããŒã¿ãšæ€èšŒããŒã¿ã®äž¡æ¹ã®åœ¹å²ãæãããŸããç©äœæ€åºã®æèã§ã¯ãK-Fold Cross Validation ã䜿çšããããšã§ãUltralytics YOLO ã¢ãã«ã®æ§èœãããã¹ãã§ãç°ãªãããŒã¿åå²ã«ããã£ãŠäžè¬åå¯èœã§ããããšã確èªãããã®ä¿¡é Œæ§ãé«ããããšãã§ããŸããK-Fold Cross Validation withUltralytics YOLO ã®ã»ããã¢ããã®è©³çŽ°ãªæé ã«ã€ããŠã¯ãK-Fold Cross Validation withUltralytics ãåç §ããŠãã ããã
Ultralytics YOLO ã䜿ã£ãŠ K-Fold 亀差æ€èšŒãå®è£ ããã«ã¯ïŒ
K-Fold Cross Validation ãUltralytics YOLO ã§å®æœããã«ã¯ã以äžã®ã¹ãããã«åŸãå¿ èŠãããïŒ
- 泚éãYOLO æ€åºãã©ãŒãããã§ããããšã確èªããã
- ã®ãããªPython ã©ã€ãã©ãªã䜿çšããã
sklearn
,pandas
ãããŠpyyaml
. - ããŒã¿ã»ããããç¹åŸŽãã¯ãã«ãäœæããŸãã
- ã䜿çšããŠããŒã¿ã»ãããåå²ããŸãã
KFold
ããsklearn.model_selection
. - YOLO ã¢ãã«ãåã¹ããªããã§ãã¬ãŒãã³ã°ããã
å æ¬çãªã¬ã€ãã«ã€ããŠã¯ãããã¥ã¡ã³ãã®K-Fold Dataset Splitã»ã¯ã·ã§ã³ãåç §ããŠãã ããã
ãªãç©äœæ€åºã«Ultralytics YOLO ã䜿ãå¿ èŠãããã®ãïŒ
Ultralytics YOLO ã¯ãé«ã粟床ãšå¹çæ§ãåããæå 端ã®ãªã¢ã«ã¿ã€ã ç©äœæ€åºãæäŸããŸããæ€åºãã»ã°ã¡ã³ããŒã·ã§ã³ãåé¡ãªã©ãè€æ°ã®ã³ã³ãã¥ãŒã¿ããžã§ã³ã¿ã¹ã¯ããµããŒãããæ±çšæ§ãåããŠããŸããããã«ãUltralytics HUB ã®ãããªããŒã«ãšã·ãŒã ã¬ã¹ã«çµ±åããã³ãŒããªãã§ã¢ãã«ã®ãã¬ãŒãã³ã°ããããã€ãè¡ãããšãã§ããŸãã詳现ã«ã€ããŠã¯ãUltralytics YOLO ã® ããŒãžã§å©ç¹ãšæ©èœãã芧ãã ããã
泚éãUltralytics YOLO ã®æ£ãããã©ãŒãããã§ããããšã確èªããã«ã¯ã©ãããã°ããã§ããïŒ
泚éã¯ãYOLO ã®æ€åºåœ¢åŒã«åŸã£ãŠãã ãããåã¢ãããŒã·ã§ã³ãã¡ã€ã«ã«ã¯ãç»åå ã®ããŠã³ãã£ã³ã°ããã¯ã¹åº§æšãšãšãã«ãªããžã§ã¯ãã¯ã©ã¹ãèšèŒããå¿ èŠããããŸããYOLO ãã©ãŒãããã¯ããªããžã§ã¯ãæ€åºã¢ãã«ããã¬ãŒãã³ã°ããããã®åççã§æšæºåãããããŒã¿åŠçãä¿èšŒããŸããé©åãªã¢ãããŒã·ã§ã³ãã©ãŒãããã®è©³çŽ°ã«ã€ããŠã¯ãYOLO æ€åºãã©ãŒãããã¬ã€ããã芧ãã ããã
ãã«ãŒãæ€åºä»¥å€ã®ã«ã¹ã¿ã ããŒã¿ã»ããã§K-Foldã¯ãã¹ããªããŒã·ã§ã³ã䜿çšã§ããŸããïŒ
泚éãYOLO æ€åºåœ¢åŒã§ããã°ãã©ã®ãããªã«ã¹ã¿ã ããŒã¿ã»ããã§ã K-Fold Cross Validation ã䜿çšã§ããŸããããŒã¿ã»ããã®ãã¹ãšã¯ã©ã¹ã©ãã«ã¯ãã«ã¹ã¿ã ããŒã¿ã»ããåºæã®ãã®ã«çœ®ãæããŠãã ããããã®æè»æ§ã«ãããã©ã®ãããªç©äœæ€åºãããžã§ã¯ãã§ããK-Fold Cross Validationã䜿çšããããã¹ããªã¢ãã«è©äŸ¡ã®æ©æµãåããããšãã§ããŸããå®çšçãªäŸãšããŠãç¹åŸŽãã¯ãã«ã®çæã®ã»ã¯ã·ã§ã³ãã芧ãã ããã