Download data
Getting your TATOR_TOKEN¶
When interacting with the Tator database, an authentication token is required. This can be generated once after logging in the annotation server, select API Token
Navigate to the API token page ![]() |
Enter your username and password, then click Get token then export your token in your shell export TATOR_TOKEN=XXXXXXXXXX |
💾 Downloading your data with aidata¶
If you are training a model, the download process will need formatting into compatible formats for training. To assist with that, we have a tool aidata. This is also generally useful for downloading data for analysis. Various formats such as VOC, CIFAR, and YOLO are supported, as well as other features, e.g. resize images, crop regions of interest (ROIs) from images/videos, and filter by labels, versions, sections, and verification status (with --verified or --unverified). It is optimized to use all of your available CPUs for speedup.
First, install the module¶
pip install mbari-aidata
Examples¶
⬇️ Download any Ctenophora sp. A data that is verified¶
export TATOR_TOKEN=<your token>
aidata download dataset --labels "Ctenophora sp. A" --verified --config https://docs.mbari.org/internal/ai/projects/config/config_bio.yml
⬇️ Download Velella across platforms and save to separate directories uav and ptvr_lm¶
export TATOR_TOKEN=<your token>
aidata download dataset --labels "Velella_velella" --verified --config https://docs.mbari.org/internal/ai/projects/config/config_uav.yml --base-path $PWD/uav
aidata download dataset --labels "Velella_velella" --crop-roi --verified --config https://docs.mbari.org/internal/ai/projects/config/config_planktivore_lm.yml --base-path $PWD/ptvr_lm
⬇️ Download from multiple versions of the same project, square for 224x224, and pad with black pixels for classification model training¶
This is useful when you want to train a classification model on combined data versions of the same project. If there is an overlap between the versions in the localizations, NMS is used to merge overlapping localizations.
export TATOR_TOKEN=<your token>
aidata download dataset --crop-roi --resize 224 --fill black --verified --version mbari-ifcb2014-vitb16-20250318_20250320_025000,mbari-ptvr-vits-b8-20250513_20250526_130025 --config https://docs.mbari.org/internal/ai/projects/config/config_planktivore_hm.yml
This produces images that may be padded with black pixels to make them square.
Example of a pseudo-nitzschia image from Planktivore black padded
⬇️ Download all verified Pinniped and Shark data and resize to 224x224 from the UAV project¶
This is useful for training a classification model. See the classification training for an example on how to train a classification model with this data.
export TATOR_TOKEN=<your token>
aidata download dataset --crop-roi --resize 224 --labels "Pinniped" --version Baseline --verified --config https://docs.mbari.org/internal/ai/projects/uav-901902/config_uav.yml
⬇️ Download all verified data and save to VOC format.¶
This is useful for training an object detection model.
export TATOR_TOKEN=<your token>
aidata download dataset --voc --resize 224 --labels "Pinniped" --version Baseline --verified --config https://docs.mbari.org/internal/ai/projects/config/config_uav.yml
⬇️ Download all verified data and save to YOLO Ultralytics format.¶
This is useful for training a detection model. Requires two steps: first download the data to VOC, then transform it to YOLO format.
export TATOR_TOKEN=<your token>
aidata download dataset --yolo --resize 224 --labels "Pinniped" --version Baseline --verified --config https://docs.mbari.org/internal/ai/projects/config/config_uav.yml
aidata transform voc-to-yolo --base-path Baseline
⬇️ Download all unverified data and save to CIFAR format.¶
export TATOR_TOKEN=<your token>
aidata download dataset --cifar --resize 224 --labels "Pinniped" --version Baseline --unverified --config https://docs.mbari.org/internal/ai/projects/config/config_uav.yml
For more examples of downloading and augmenting your data which is useful for training models, see the transform command. Augmentation refers to the process of applying transformations to your data to increase the size and diversity of your dataset, which will improve the performance of your models without the need for additional labeled data. We have found this useful for large format images, such as those from the UAV project.
⬇️ Downloading through the Tator web interface through the Metadata button.¶
Don't want to use the command line? There is a utility build in to download through the web interface
through the Metadata button.
Here is a quick video showing how to do this: Download Metadata from Tator
This will save a CSV file with any number of columns depending on your project configuration. For example, for the UAV project,
a useful export would be the following:
| (media) altitude | (media) date | (media) latitude | (media) longitude | (media) make | (media) model | $version_name | $x_pixels | $y_pixels | $width_pixels | $height_pixels | Label | score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 60.62410003789310 | 2024-05-02T16:06:48+00:00 | 36.976068022980300 | 121.92875717895700 | SONY | DSC-RX1RM2 | Baseline | 6435.381443 | 860.939130 | 271.618557 | 410.060870 | Shark | 1 |
| 59.8184 | 2024-05-02T17:13:32+00:00 | 36.97105967800670 | 121.91850362801700 | SONY | DSC-RX1RM2 | Baseline | 3827.412371 | 2777.553623 | 317.587629 | 179.446377 | Shark | 1 |
| 59.27389984825490 | 2024-05-02T17:16:14+00:00 | 36.96902484400940 | 121.91568711392600 | SONY | DSC-RX1RM2 | Baseline | 4601.092784 | 2464.950725 | 250.907216 | 400.049275 | Shark | 1 |
| 59.39959973315540 | 2024-05-02T17:17:05+00:00 | 36.96702581099970 | 121.91080038101100 | SONY | DSC-RX1RM2 | Baseline | 6127.958763 | 3807.605797 | 261.041237 | 363.394203 | Shark | 1 |
Downloading through the Tator API¶
⬇️ Download all localizations in a section to a CSV¶
import pandas as pd
import tator
import os
project_id = 12 # Planktivore project id
section_id = 273 # Velella low mag section
# Connect to Tator
api = tator.get_api(host='https://mantis.shore.mbari.org', token=os.environ['TATOR_TOKEN'])
# Get list of media
localizations = []
medias = api.get_media_list(project_id, section=section_id)
media_id_list = [media.id for media in medias]
print(f'Found {len(medias)} media(s)')
# Batch fetch annotations
batch_size = 100
# See https://www.tator.io/docs/references/tator-py/api#get_localization_list
for start in range(0, len(media_id_list), batch_size):
chunk = media_id_list[start: start + batch_size]
locs = api.get_localization_list(project=project_id, media_id=chunk, section=section_id, attribute=["verified::True"])
localizations.extend(locs)
if len(locs) == 0:
break
print(f"Found {len(locs)} new verified localizations")
data = [loc.to_dict() for loc in localizations]
df = pd.json_normalize(data)
df.to_csv("velella_ptvr.csv")
🗓️ Updated: 2026-06-19


