generate: Download Images & Extract Localizations from M3¶
The generate command queries M3's annotation database to download images and extract bounding box localizations in Pascal VOC format. This is the primary data acquisition tool for creating object detection datasets from MBARI's VARS.
Usage¶
Required Parameters¶
IMAGE_DIR: Directory where downloaded images will be savedXML_DIR: Directory where Pascal VOC XML annotation files will be saved
Authentication Options¶
--config-url: Raziel config URL (default: "https://m3.shore.mbari.org/config")
Authentication
You'll be prompted to enter your username and password for M3 access. The tool does not store your credentials anywhere.
Filtering Options¶
-
--include-concept: Concept name to include, or file with concepts listed one per line- May be specified multiple times for multiple concepts
- If not specified, all concepts are included
-
--include-descendants: Include taxonomic descendants of specified concepts -
--exclude-concept: Concept name to exclude, or file with concepts listed one per line- May be specified multiple times for multiple concepts
- Excluded concepts are subtracted from the set of resolved included concepts
- Can be used with or without
--include-concept
-
--exclude-descendants: Include taxonomic descendants of specified excluded concepts -
--include-group: Group name to include (e.g., 'ROV:verified')- May be specified multiple times for multiple groups
- If specified, only annotations from these groups will be included
-
--exclude-group: Group name to exclude (e.g., 'ROV:pending-verifications')- May be specified multiple times
-
--exclude-activity: Activity name to exclude (e.g., 'unspecified')- May be specified multiple times
-
--exclude-project: Project name to exclude (e.g., 'ML-Tracking')- May be specified multiple times
--include-imaged-moment: Include all annotations from each imaged moment that has any matching concept (default: True)- If False, only annotations with matching concepts are included
--with-tag: Include only bounding boxes with specific tags- May be specified multiple times, and only bounding boxes containing ALL specified tags will be included
- Example tags might include 'training' or other project-specific metadata
-
--video-name: Filter by video name- May be specified multiple times for multiple videos
-
--video-sequence-name: Filter by video sequence name- May be specified multiple times for multiple video sequences
-
--video-timestamp-min: Filter by minimum video start timestamp (ISO 8601 format with Z suffix)- Example: "2021-01-01T00:00:00Z"
-
--video-timestamp-max: Filter by maximum video start timestamp (ISO 8601 format with Z suffix)- Example: "2021-12-31T23:59:59Z"
--observer: Filter by observer name- May be specified multiple times for multiple observers
- Only annotations created by the specified observer(s) will be included
Output Options¶
--pretty-print: Format the XML output with indentation (default: True)--download-images: Actually download the images; if False, only XMLs are generated (default: True)--verbose, -v: Display additional debugging information
How It Works¶
- The command authenticates with the Raziel service to get endpoint information
- It sends a SQL query to the Annosaurus service to retrieve bounding box annotations
- For concepts with descendants, it uses the VARS KB Server to expand the query
- It associates each annotation with either:
- An image reference (a direct URL to an image)
- A video frame (from which an image can be extracted)
- Images are downloaded and saved to the specified directory
- Pascal VOC XML annotations are generated and saved
Network Requirements
This command requires internet access to the MBARI endpoints. The download process can be bandwidth and time intensive for large datasets.
Dry Run
Use --download-images False to generate only annotation files for planning purposes.
This will help you assess the dataset size before downloading all the images.
Examples¶
Download a dataset for Sebastes rockfish including descendant species:
m3-download generate Sebastes_images/ Sebastes_voc/ --include-concept Sebastes --include-descendants
Download data for multiple explicitly listed species:
m3-download generate fish_images/ fish_voc/ --include-concept "Sebastes" --include-concept "Sardinops sagax"
Download data using species listed in a file:
Download data for Sebastes but excluding specific sub-species:
m3-download generate sebastes_filtered/ sebastes_voc/ --include-concept Sebastes --include-descendants --exclude-concept "Sebastes mystinus" --exclude-concept "Sebastes serranoides"
Download all data but exclude specific problematic concepts:
Download data excluding concepts and their descendants:
Download data excluding problematic groups and activities:
m3-download generate images/ annotations/ --exclude-group 'ROV:pending-verifications' --exclude-activity 'unspecified' --exclude-project 'ML-Tracking'
Download data including only specific verified groups:
m3-download generate images/ annotations/ --include-group 'ROV:verified' --include-group 'AUV:reviewed'
Download data with specific tags (e.g., only training examples):
Filter by video name or video sequence name:
Filter by video start timestamp range:
Filter by observer:
Generate annotations without downloading images (useful for planning):