add-taxonomy: Add Taxonomic Information to Pascal VOC Annotations¶
The add-taxonomy command enriches Pascal VOC annotation files by adding taxonomic classification information to each object. This information is retrieved from MBARI's Deep-Sea Guide taxonomic database, providing a hierarchy of taxonomic ranks (e.g., phylum, class, order, family, genus) for each identified organism.
Usage¶
Parameters¶
INPUT_DIR: Directory containing Pascal VOC annotation XML files--output-dir: (Optional) Output directory for enriched annotations; if omitted, original files are overwritten
Data Modification
If --output-dir is not specified, original annotation files will be overwritten.
Always use --output-dir when you want to preserve the original annotations.
Taxonomic Data Structure¶
The command adds a <taxonomy> element under each <object> element, containing available taxonomic ranks:
<object>
<name>Sebastes</name>
<!-- ... other object elements ... -->
<taxonomy>
<phylum>Chordata</phylum>
<class>Actinopterygii</class>
<order>Scorpaeniformes</order>
<family>Sebastidae</family>
<genus>Sebastes</genus>
</taxonomy>
</object>
How It Works¶
- The command scans all Pascal VOC XML files in the input directory
- It extracts all unique concept names from the annotations
- For each unique concept, it:
- Queries MBARI's Deep-Sea Guide API to retrieve taxonomic information
- Parses the JSON response into a hierarchical structure
- The command then processes each annotation file, adding the retrieved taxonomy data
- Modified files are written either in-place or to the output directory
API Details
The taxonomic information is retrieved from the Deep-Sea Guide API at:
https://dsg.mbari.org/kb/v1/phylogeny/basic
Network Dependency
- Internet connection is required to access the Deep-Sea Guide API
- If taxonomy information for a concept cannot be retrieved, that object remains unchanged
- Failed API calls are logged but don't interrupt the overall processing
Processing Details
- XML formatting is preserved using
minidom.parseStringandtoprettyxml - All taxonomic ranks available in the API response are included in the output
Examples¶
Add taxonomy and save to a new directory:
Add taxonomy by overwriting original files: