Skip to content

add-taxonomy: Add Taxonomic Information to Pascal VOC Annotations

The add-taxonomy command enriches Pascal VOC annotation files by adding taxonomic classification information to each object. This information is retrieved from MBARI's Deep-Sea Guide taxonomic database, providing a hierarchy of taxonomic ranks (e.g., phylum, class, order, family, genus) for each identified organism.

Usage

m3-download add-taxonomy INPUT_DIR [--output-dir OUTPUT_DIR]

Parameters

  • INPUT_DIR: Directory containing Pascal VOC annotation XML files
  • --output-dir: (Optional) Output directory for enriched annotations; if omitted, original files are overwritten

Data Modification

If --output-dir is not specified, original annotation files will be overwritten. Always use --output-dir when you want to preserve the original annotations.

Taxonomic Data Structure

The command adds a <taxonomy> element under each <object> element, containing available taxonomic ranks:

<object>
  <name>Sebastes</name>
  <!-- ... other object elements ... -->
  <taxonomy>
    <phylum>Chordata</phylum>
    <class>Actinopterygii</class>
    <order>Scorpaeniformes</order>
    <family>Sebastidae</family>
    <genus>Sebastes</genus>
  </taxonomy>
</object>

How It Works

  1. The command scans all Pascal VOC XML files in the input directory
  2. It extracts all unique concept names from the annotations
  3. For each unique concept, it:
    • Queries MBARI's Deep-Sea Guide API to retrieve taxonomic information
    • Parses the JSON response into a hierarchical structure
  4. The command then processes each annotation file, adding the retrieved taxonomy data
  5. Modified files are written either in-place or to the output directory

API Details

The taxonomic information is retrieved from the Deep-Sea Guide API at: https://dsg.mbari.org/kb/v1/phylogeny/basic

Network Dependency

  • Internet connection is required to access the Deep-Sea Guide API
  • If taxonomy information for a concept cannot be retrieved, that object remains unchanged
  • Failed API calls are logged but don't interrupt the overall processing

Processing Details

  • XML formatting is preserved using minidom.parseString and toprettyxml
  • All taxonomic ranks available in the API response are included in the output

Examples

Add taxonomy and save to a new directory:

m3-download add-taxonomy Sebastes_annotations/ --output-dir Sebastes_with_taxonomy/

Add taxonomy by overwriting original files:

m3-download add-taxonomy Benthocodon/