Frequently Asked Questions¶

How much data have these tools been used on?¶

What kind of projects are these tools useful for?¶

If you are working on a project that involves classifying or detecting objects in any kind of media that can be represented as an image, these tools will be useful to you. Pretty much any data than can be represented as an image is possible - e.g. sound spectrograms, images from a drone, time-lapse camera, video, or individual frames from a video.

Do I need to label all of my data to train a model?¶

Short answer: No, you do not need to label all of your data to train a model.

Long answer: You can start with a small subset of your data, label it, and then use that labeled data to train a model. Once the model is trained, you can use it to label the rest of your data. We have found that two iterations of this process are sufficient to label most datasets by making use of the optimized clustering algorithm in sdcat and the vector database to find similar images.

For a good classification model, a lower limit of 100 is ok, but a good target is 300 and more is generally, but not always better - sometimes having thousands of examples can slow do the training time unecessarily. For detection models, a good target is 1000 frames per class, but can be less for particular applications. Please keep in mind that object detection models take more time to develop good training data for because you need to label everything in the frame. We have found that simplifying the labeling task to a binary classification one, e.g. background versus the target of interest, then focusing on a good classification model yeilds better performance in terms of accuracy and speed of development.

I would like to use these tool to label my data. Does is work on video?¶

Yes. You can upload your video to the Mantis server, which is the server that hosts Tator, the annotation tool used for labeling. The video will be transcoded to a format that is best optimized for web viewing and annotating over varying bandwidth connections. This may take some time depending on the size of your video and the load on the server, but once transcoding is complete, you will be able to view and annotate your video in Tator.

Alternatively, to save disk space, we have found that you can leave your video in place if it is already in a compatible we format, or convert it yourself to the resolutions you would like. How that is done, depends on the format of your video. Once that is done, you can register your video with Tator with the preferred method using the mbari-aidata pip module which is a python module.

Install it with:

pip install mbari-aidata

Then, register your video with something like following:

aidata upload \
    --config https://docs.mbari.org/internal/ai/projects/config/config_uav.yml \
    --base-path $PWD \
    --version Baseline \
    --token <TATOR_TOKEN> \
    --video input.mp4

You can grab your Tator token after logging in with the API token link.

info

Your data will need to be accessible through a web-browser to run this command and configured in your project specific configuration.yml files. This is very straightforward if your data is on atlas/titan/thalassa. Ask anyone on the team Danelle | Duane | Fernanda | Laura
for help setting up a project configuration file and a custom loader that captures any special information about your video data from the filename, e.g. depth, time of day, etc., which can be added to the configuration file to help with searching and filtering your media.

How is the stored?¶

The data that is stored is typically the metadata about your images, the images themselves, and the feature vectors. This is stored in a couple of different ways:

The Tator database, which stores all of the metadata about your images, e.g. the location, the size, the latitude/longitude/depth, etc. This is a relational database called PostgreSQL,
A Vector database, which stores the feature vectors for examplars of your data. What is an exemplar? This is simply put good representations of the images in your data. What is a feature vector? This is a numerical representation of the image that captures the important features of the image. We typically use a 768 element vector for our images and these are created either using a model like DINOv2 or CLIP ( good), or a model that is trained on your data (better). Vector databases are fast way to access your data because they are in-memory, versus on a file system as with a PostgreSQL database. We use a REDIS based vector database called Redis which is the world's fastest vector database. You can see the projects in our vector database through the FastAPI on Doris.

Is Tator the same tool used by FathomNet?¶

Well, yes and no. The Tator database is the same, but the user interface is different as best as we understand it.

Updated: 2025-09-13