Skip to content

generate: Download Images & Extract Localizations from M3

The generate command queries M3's annotation database to download images and extract bounding box localizations in Pascal VOC format. This is the primary data acquisition tool for creating object detection datasets from MBARI's VARS.

Usage

m3-download generate IMAGE_DIR XML_DIR [OPTIONS]

Required Parameters

  • IMAGE_DIR: Directory where downloaded images will be saved
  • XML_DIR: Directory where Pascal VOC XML annotation files will be saved

Authentication Options

  • --config-url: Raziel config URL (default: "https://m3.shore.mbari.org/config")

Authentication

You'll be prompted to enter your username and password for M3 access. The tool does not store your credentials anywhere.

Filtering Options

  • --include-concept: Concept name to include, or file with concepts listed one per line

    • May be specified multiple times for multiple concepts
    • If not specified, all concepts are included
  • --include-descendants: Include taxonomic descendants of specified concepts

  • --exclude-concept: Concept name to exclude, or file with concepts listed one per line

    • May be specified multiple times for multiple concepts
    • Excluded concepts are subtracted from the set of resolved included concepts
    • Can be used with or without --include-concept
  • --exclude-descendants: Include taxonomic descendants of specified excluded concepts

  • --include-group: Group name to include (e.g., 'ROV:verified')

    • May be specified multiple times for multiple groups
    • If specified, only annotations from these groups will be included
  • --exclude-group: Group name to exclude (e.g., 'ROV:pending-verifications')

    • May be specified multiple times
  • --exclude-activity: Activity name to exclude (e.g., 'unspecified')

    • May be specified multiple times
  • --exclude-project: Project name to exclude (e.g., 'ML-Tracking')

    • May be specified multiple times

v0.8.0

  • --include-imaged-moment: Include all annotations from each imaged moment that has any matching concept (default: True)
    • If False, only annotations with matching concepts are included

v0.10.0

  • --with-tag: Include only bounding boxes with specific tags
    • May be specified multiple times, and only bounding boxes containing ALL specified tags will be included
    • Example tags might include 'training' or other project-specific metadata

v0.11.0

  • --video-name: Filter by video name

    • May be specified multiple times for multiple videos
  • --video-sequence-name: Filter by video sequence name

    • May be specified multiple times for multiple video sequences
  • --video-timestamp-min: Filter by minimum video start timestamp (ISO 8601 format with Z suffix)

    • Example: "2021-01-01T00:00:00Z"
  • --video-timestamp-max: Filter by maximum video start timestamp (ISO 8601 format with Z suffix)

    • Example: "2021-12-31T23:59:59Z"

v0.12.0

  • --observer: Filter by observer name
    • May be specified multiple times for multiple observers
    • Only annotations created by the specified observer(s) will be included

Output Options

  • --pretty-print: Format the XML output with indentation (default: True)
  • --download-images: Actually download the images; if False, only XMLs are generated (default: True)
  • --verbose, -v: Display additional debugging information

How It Works

  1. The command authenticates with the Raziel service to get endpoint information
  2. It sends a SQL query to the Annosaurus service to retrieve bounding box annotations
  3. For concepts with descendants, it uses the VARS KB Server to expand the query
  4. It associates each annotation with either:
    • An image reference (a direct URL to an image)
    • A video frame (from which an image can be extracted)
  5. Images are downloaded and saved to the specified directory
  6. Pascal VOC XML annotations are generated and saved

Network Requirements

This command requires internet access to the MBARI endpoints. The download process can be bandwidth and time intensive for large datasets.

Dry Run

Use --download-images False to generate only annotation files for planning purposes. This will help you assess the dataset size before downloading all the images.

Examples

Download a dataset for Sebastes rockfish including descendant species:

m3-download generate Sebastes_images/ Sebastes_voc/ --include-concept Sebastes --include-descendants

Download data for multiple explicitly listed species:

m3-download generate fish_images/ fish_voc/ --include-concept "Sebastes" --include-concept "Sardinops sagax"

Download data using species listed in a file:

m3-download generate species_images/ species_voc/ --include-concept species_list.txt

Download data for Sebastes but excluding specific sub-species:

m3-download generate sebastes_filtered/ sebastes_voc/ --include-concept Sebastes --include-descendants --exclude-concept "Sebastes mystinus" --exclude-concept "Sebastes serranoides"

Download all data but exclude specific problematic concepts:

m3-download generate all_images/ all_annotations/ --exclude-concept problematic_concepts.txt

Download data excluding concepts and their descendants:

m3-download generate images/ annotations/ --exclude-concept "Sebastolobus" --exclude-descendants

Download data excluding problematic groups and activities:

m3-download generate images/ annotations/ --exclude-group 'ROV:pending-verifications' --exclude-activity 'unspecified' --exclude-project 'ML-Tracking'

Download data including only specific verified groups:

m3-download generate images/ annotations/ --include-group 'ROV:verified' --include-group 'AUV:reviewed'

Download data with specific tags (e.g., only training examples):

m3-download generate images/ annotations/ --with-tag training

Filter by video name or video sequence name:

m3-download generate images/ annotations/ --video-name "V4361" --video-name "V4362"

m3-download generate images/ annotations/ --video-sequence-name "Ventana 4361"

Filter by video start timestamp range:

m3-download generate images/ annotations/ --video-after "2021-01-01" --video-before "2021-01-31"

Filter by observer:

m3-download generate images/ annotations/ --observer "jdoe" --observer "asmith"

Generate annotations without downloading images (useful for planning):

m3-download generate images/ annotations/ --download-images False --include-concept Sebastes