Skip to content

Training data annotation workflow

The training data annotation workflow is generally 4-5 steps, and is iterative. The goal is to create a high-quality labeled dataset for training machine learning models. Typically, this process is only repeated 2-3 times before you have a satisfactory dataset.

trainworkflow.png

  1. Preview cluster grids either through the Mantis web interface or your grids generated by SDCAT
  2. Assign clusters to the appropriate class, e.g. kelp, bird, whale, diatom, etc. through the Mantis web interface or in bulk through the bulk REST API
  3. If annotating images/video. Review and correct the bounding boxes or add new ones. Skip this if you are only labeling region-of-interest (ROI) data.
  4. Train a model on the data. This can currently be a classification model, or a detection model.
  5. Preview the performance metrics for your labels
  6. Repeat steps 1-5 until you are satisfied with the results

Example grids from SDCAT

(mostly) akashiwo cluster jellies cluster velella cluster
akashiwo detections_cluster_8_p0.jpg dino_vits8_20240430_155718_cluster_53_p0.png

🗓️ Updated: 2025-09-02