Skip to content

remap-voc: Remap Concept Names in Pascal VOC Annotations

The remap-voc command performs bulk renaming of concept (class) names in Pascal VOC annotation XML files according to a provided mapping file. This is useful for harmonizing taxonomic labels, merging similar classes, or simplifying the class hierarchy.

Usage

m3-download remap-voc MAP_FILE INPUT_DIR [--output-dir OUTPUT_DIR]

Parameters

  • MAP_FILE: File (CSV or JSON) containing the concept remapping definitions
  • INPUT_DIR: Directory containing Pascal VOC XML files to process
  • --output-dir: (Optional) Output directory for remapped annotations; if omitted, original files are overwritten

Data Loss Risk

If --output-dir is not specified, original annotation files will be overwritten without confirmation. Always use --output-dir when testing a new remapping.

Mapping File Formats

The MAP_FILE can be provided in two formats:

CSV Format

Simple two-column format with original concept names in the first column and target concept names in the second column:

LRJ complex,Benthocodon
Benthocodon pedunculata,Benthocodon
Peniagone sp. A,Peniagone
Peniagone sp. 2,Peniagone
Peniagone sp. 1,Peniagone
Peniagone vitrea,Peniagone
Peniagone vitrea- sp. 1 complex,Peniagone
Peniagone papillata,Peniagone
Scotoplanes sp. A,Scotoplanes
Scotoplanes clarki,Scotoplanes
Scotoplanes globosa,Scotoplanes

JSON Format

A JSON object with original concept names as keys and target concept names as values:

{
  "LRJ complex": "Benthocodon",
  "Benthocodon pedunculata": "Benthocodon",
  "Peniagone sp. A": "Peniagone",
  "Peniagone sp. 2": "Peniagone",
  "Peniagone sp. 1": "Peniagone",
  "Peniagone vitrea": "Peniagone",
  "Peniagone vitrea- sp. 1 complex": "Peniagone",
  "Peniagone papillata": "Peniagone",
  "Scotoplanes sp. A": "Scotoplanes",
  "Scotoplanes clarki": "Scotoplanes",
  "Scotoplanes globosa": "Scotoplanes"
}

How It Works

  1. The command detects the mapping file format based on file extension
  2. For each XML file in the input directory, it:
    • Parses the XML structure
    • Checks each <object> element for concept names that need remapping
    • Replaces the <name> element content if a match is found
    • Writes the modified XML either in-place or to the output directory

Processing Details

  • Only concept names that match entries in the mapping file are modified
  • XML formatting is preserved using minidom.parseString and toprettyxml
  • The command reports how many annotation files were modified

Examples

Using a CSV mapping file with an output directory:

m3-download remap-voc remapping.csv Benthocodon/ --output-dir Benthocodon_remapped/

Using a JSON mapping file and overwriting existing files:

m3-download remap-voc taxonomy_map.json annotations/