Split¶
The split command is used to randomly split a dataset into training, validation, and test sets. By default, it uses a split ratio of 85% training, 10% validation, and 5% test.
Prerequisites¶
The input directory must be organized with images/ and labels/ subfolders:
dataset_root/
├── images/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── labels/
├── image1.txt
├── image2.txt
└── ...
The labels should be in YOLO format (.txt files) and correspond to the images.
Usage¶
To split your dataset, use the split command with the --input (or -i) and --output (or -o) options.
aidata split --input ./my_dataset --output ./my_dataset_split
Output¶
The command generates two compressed tarballs in the output directory:
- images.tar.gz: Contains the split images organized into
train/,val/, andtest/subfolders. - labels.tar.gz: Contains the split labels organized into
train/,val/, andtest/subfolders.
The resulting structure inside the tarballs (when extracted) will look like this:
images/
├── train/
├── val/
└── test/
labels/
├── train/
├── val/
└── test/
Additionally, the command creates three text files in the input directory listing the files assigned to each split:
- autosplit_train.txt
- autosplit_val.txt
- autosplit_test.txt
Note
The split is reproducible as it uses a fixed random seed (0).
last updated: 2026-02-08