Skip to content

Training data annotation workflow

The training data annotation workflow is iterative. The goal is to build a high-quality labeled dataset, train a model, review the results, and then use what you learned to improve the next pass.

Most projects only need 2 to 3 iterations before the dataset is in good shape for training and evaluation.

trainworkflow.png

Workflow steps

  1. Preview cluster grids in Mantis or review grids generated by SDCAT.
  2. Assign each cluster to the appropriate class, such as kelp, bird, whale, or diatom, either in the web interface or through the bulk REST API.
  3. If you are annotating images or video, review the bounding boxes and correct or add annotations as needed. Skip this step for ROI-only workflows.
  4. Train a model on the current dataset. Use either a classification workflow or an object detection workflow.
  5. Review the performance metrics and sample outputs to find weak labels, missing classes, or confusing examples.
  6. Repeat the cycle until the labels and model behavior are stable enough for your use case.

Example grids from SDCAT

(mostly) akashiwo cluster jellies cluster velella cluster
akashiwo detections_cluster_8_p0.jpg dino_vits8_20240430_155718_cluster_53_p0.png

Updated: 2026-04-01