Skip to content

Classification training

img.png Training a Classification Model

Model training requires two steps:

  1. Download the data to be used for training, validation, and testing the model
  2. Initiate the training

This involves the mbari-aidata package and the vitrain code to train the model.

Tip

Get the project yaml file from the project page Install the aidata tool. See the installation instructions

Here is a sequence of commands, by way of example to download only verified data of specific classes. The TATOR_TOKEN is an environment variable that should be set to your Tator token. See instructions here: Tator Token on how to get your Tator token.

Download

Download all verified ROIs from the dataset using the aidata command line tool.

aidata download dataset \
--config https://docs.mbari.org/internal/ai/projects/902004-Planktivore/config_highmag.yml \
--crop-roi \
--resize 224 \
--base-path $PWD \
--verified \
--token $TATOR_TOKEN

or for specific labels, you can use the following command:

aidata download dataset \
--config https://docs.mbari.org/internal/ai/projects/902004-Planktivore/config_highmag.yml \
--crop-roi \
--resize 224 \
--base-path $PWD \
--verified \
--labels "Nano_plankton,Ceratium" \
--token $TATOR_TOKEN

For more information on the download command, see the aidata download documentation.

Tip

Before running the download command, you can check the available labels in the dataset for your project through the fast ⚡️ lookup for all labels.
You can get that here for this project: http://mantis.shore.mbari.org:8001/labels/902004-Planktivore this will return all labels in sorted order from the largest to smallest.

If you click the "pretty-print" it will be easy to read

Verify ROI Download Count

Ensure that you have downloaded the expected number of ROIs by running the following command.

find . -maxdepth 1 -type d | while read -r dir; do printf "%s:\t" "$dir"; find "$dir" -type f | wc -l; done
This is especially useful when executed within the crops folder generated by the download script above. An example output is shown below:
.:  3372
./Tiarina:  24
./Nano_plankton:    243
./Detonula_Cerataulina_Lauderia:    41
./Ceratium: 38
./Strombidium:  11
./Truncated:    35
./Medium_pennate:   383
./Mesodinium:   158
./Cylindrotheca:    11
./Prorocentrum: 166
./Dinoflagellate:   53
./Detritus: 497
./Ciliate:  23
./Thalassionema:    43
./Akashiwo: 70
./Chaetoceros:  818
./Pseudo-nitzschia: 355
./Eucampia: 12
./Polykrikos:   14
./Gyrodinium:   101
./Amphidinium_Oxyphysis:    247
./Guinardia_Dactyliosolen:  28

Train 🚀

First, clone the repository and install the requirements:

git clone https://github.com/mbari-org/vittrain
cd vittrain
pip install -r requirements.txt

Train with a 16-block size Vision Transformer (ViT) model and name mbari-uav-vits-b16:

python src/fine_tune_vits.py \
        --model-name mbari-uav-vits-b16 \
        --base-model google/vit-base-patch16-224-in21k8 \
        --raw-data $PWD/crops \
        --filter-data $PWD/filtered \
        --add-rotations True \
        --num-epochs 5

Train with an 8-block size Vision Transformer (ViT) model and name mbari-uav-vits-b8:

python src/fine_tune_vits.py \
        --model-name mbari-uav-vits-b8 \
        --base-model facebook/dino-vitb8 \
        --raw-data $PWD/crops \
        --filter-data $PWD/filtered \
        --add-rotations True \
        --num-epochs 5

Train with a 32-block size Vision Transformer (ViT) model and name mbari-uav-vits-b32:

python src/fine_tune_vits.py \
        --model-name mbari-uav-vits-b32 \
        --base-model openai/clip-vit-base-patch32 \
        --raw-data $PWD/crops \
        --filter-data $PWD/filtered \
        --add-rotations True \
        --num-epochs 5 

🗓️ Updated: 2025-07-04