remap-voc: Remap Concept Names in Pascal VOC Annotations¶
The remap-voc command performs bulk renaming of concept (class) names in Pascal VOC annotation XML files according to a provided mapping file. This is useful for harmonizing taxonomic labels, merging similar classes, or simplifying the class hierarchy.
Usage¶
Parameters¶
MAP_FILE: File (CSV or JSON) containing the concept remapping definitionsINPUT_DIR: Directory containing Pascal VOC XML files to process--output-dir: (Optional) Output directory for remapped annotations; if omitted, original files are overwritten
Data Loss Risk
If --output-dir is not specified, original annotation files will be overwritten without confirmation.
Always use --output-dir when testing a new remapping.
Mapping File Formats¶
The MAP_FILE can be provided in two formats:
CSV Format¶
Simple two-column format with original concept names in the first column and target concept names in the second column:
LRJ complex,Benthocodon
Benthocodon pedunculata,Benthocodon
Peniagone sp. A,Peniagone
Peniagone sp. 2,Peniagone
Peniagone sp. 1,Peniagone
Peniagone vitrea,Peniagone
Peniagone vitrea- sp. 1 complex,Peniagone
Peniagone papillata,Peniagone
Scotoplanes sp. A,Scotoplanes
Scotoplanes clarki,Scotoplanes
Scotoplanes globosa,Scotoplanes
JSON Format¶
A JSON object with original concept names as keys and target concept names as values:
{
"LRJ complex": "Benthocodon",
"Benthocodon pedunculata": "Benthocodon",
"Peniagone sp. A": "Peniagone",
"Peniagone sp. 2": "Peniagone",
"Peniagone sp. 1": "Peniagone",
"Peniagone vitrea": "Peniagone",
"Peniagone vitrea- sp. 1 complex": "Peniagone",
"Peniagone papillata": "Peniagone",
"Scotoplanes sp. A": "Scotoplanes",
"Scotoplanes clarki": "Scotoplanes",
"Scotoplanes globosa": "Scotoplanes"
}
How It Works¶
- The command detects the mapping file format based on file extension
- For each XML file in the input directory, it:
- Parses the XML structure
- Checks each
<object>element for concept names that need remapping - Replaces the
<name>element content if a match is found - Writes the modified XML either in-place or to the output directory
Processing Details
- Only concept names that match entries in the mapping file are modified
- XML formatting is preserved using
minidom.parseStringandtoprettyxml - The command reports how many annotation files were modified
Examples¶
Using a CSV mapping file with an output directory:
Using a JSON mapping file and overwriting existing files: