Skip to content

Object detection training

Training an Object Detection Model

When you're ready to train a model, the process generally boils down to three main steps:

  1. Download the data for training, validation, and testing.
  2. Transform that data into the format the model needs.
  3. Initiate the training.

We use two main Python packages for this: mbari-aidata and deepsea-ai. Keep in mind that you'll need an AWS account to use deepsea-ai.

Tip

I recommend grabbing your project's YAML file from the project page. If you haven't yet, install the aidata tool—see the installation instructions if you need help.

Here’s an example of the command sequence I typically use:

Download

aidata download dataset \
--config https://docs.mbari.org/internal/ai/projects/config/config_uav.yml \
--base-path $PWD \
--labels "Surfboard","Batray","Plume","Sea_Lion","Bird","Seal","Wave","Foam","Egregia","Reflectance","Buoy","Shark","Person","Mooring","Otter","Boat","Kelp","Mola","Secci_Disc","Jelly","Whale","RIB" \
--voc \
--token $TATOR_TOKEN

For more details, you can look at the aidata download documentation.

Prepare

First, transform the data. I often use crops with overlap for our imagery:

aidata transform voc --base-path $PWD/Baseline --crop-size 1280 --crop-overlap 0.5

Then, convert it to YOLO format:

aidata transform voc-to-yolo --base-path $PWD/Baseline/transformed

And split it into train, validate, and test sets:

aidata transform split -i $PWD/Baseline/transformed -o $PWD/Baselinesplit

Train in AWS SageMaker with deepsea-ai 🚀

Before you start training, make sure you've prepared your data correctly.

If you've scaled your training to 1280x1280, I recommend using yolov5x6:

info

Make sure your --batch-size is a multiple of the available GPUs. For example, use --batch-size 4 for ml.p3.8xlarge, or --batch-size 16 for ml.p3.16xlarge.

deepsea-ai train --model yolov5x6 --instance-type ml.p3.16xlarge \
--config 901902_uavs.ini \
--labels $PWD/BaselineSplit/labels.tar.gz \
--images $PWD/BaselineSplit/images.tar.gz \
--label-map $PWD/Baseline/labels.txt \
--input-s3 s3://901902-new-starting-checkpoint/megafish_ROV_weights.pt \
--output-s3 s3://901902-new-model-checkpoints/ \
--resume True \
--epochs 60 \
--batch-size 16

info

Before you hit run, check that there isn't already a yolov5x6 folder in your output-s3 bucket. The training job will overwrite anything there, so move any old runs to a different folder first.

info

Also, double-check that there isn't a training folder in your input-s3 bucket, as the job uses that for images, labels, and text files.

Train RF-DETR model

Note: Duane is adding specific instructions here.

Train yolov11x model in Google Colab environment 🚀

I've also been documenting how to train a yolo11x detector on an A100 instance in Colab, using our UAV images as an example.

On icefish, in the train-drone-model directory:

Setup

pyenv shell 3.11.6
python3 -m venv train-drone
source train-drone/bin/activate
pip install mbari-aidata

Use the YAML file from here.

Set TATOR_TOKEN

export TATOR_TOKEN=<your_token_here>  # Grab this from your TATOR credentials
echo $TATOR_TOKEN

Download data

aidata download dataset \
  --config https://docs.mbari.org/internal/ai/projects/config/config_uav.yml \
  --base-path $PWD/Sept232025 --voc \
  --labels "Batray","Bird","Boat","Cement_Ship","Egregia","Fish","Jelly","Kayak","Kelp","Mola","Mooring_Buoy","Otter","Person","Pinniped","Secci_Disc","Shark","Surfboard","Velella_velella","Velella_velella_raft","Whale" \
  --single-class "object" \
  --verified \
  --token $TATOR_TOKEN \
  --disable-ssl-verify

logs go to

~/mbari_aidata/logs/

Transform data

./transform.sh

aidata transform voc --base-path $PWD/Sept232025 --resize 640 --crop-size 640  --crop-overlap 0.5 --min-visibility 0.0 --min-dim 20

./voc_to_yolo.sh

aidata transform voc-to-yolo  --base-path $PWD/Sept232025/transformed

./split.sh

then run split. See the transform command for more details

aidata transform split -i $PWD/Sept232025/transformed -o $PWD/Sept232025split

Train yolo11x model

upload or create data.yaml file

train: /content/datasets/train/images
val: /content/datasets/val/images
test: /content/datasets/test/images

nc: 1
names: ['object']

roboflow:
  workspace: liangdianzhong
  project: -qvdww
  version: 3
  license: CC BY 4.0
  url: https://universe.roboflow.com/liangdianzhong/-qvdww/dataset/3

Train model from COCO weights

In Colab notebook, upload data to google drive. Here the folder on google drive is named "uavs"

Screenshot 2025-10-01 at 2 54 41 PM

Install YOLO11 via Ultralytics

%pip install ultralytics supervision roboflow -q
import ultralytics
ultralytics.checks()

and then move data to appropriate directories

# Allow access to personal google drive and add new folders

# Connect Google Drive
from google.colab import drive
drive.mount("/content/drive", force_remount=True) # This will prompt for authorization.

# This will create the uavs files if they don't exist.
folders =  ["uavs/"]
for folder in folders:
path = "/content/drive/MyDrive/" + folder
  if not os.path.exists(path): # Create the folder if it does not exist
    os.mkdir(path)      

set up HOME. Move in the data compressed files, and unpack them.

import os
HOME = os.getcwd()
print(HOME)       

!mkdir {HOME}/datasets
%cd {HOME}/datasets
from google.colab import userdata
uavs_folder = "/content/drive/MyDrive/uavs/"

!mkdir /content/datasets/
!mkdir /content/datasets/savedir/
!cp -r "/content/drive/MyDrive/uavs/images.tar.gz" "/content/datasets/savedir/"

!cp -r "/content/drive/MyDrive/uavs/labels.tar.gz" "/content/datasets/savedir/"

!tar xf /content/datasets/savedir/images.tar.gz --directory /content/datasets/savedir/

!tar xf /content/datasets/savedir/labels.tar.gz --directory /content/datasets/savedir/

move the data to the directory structure YOLO expects

## make the directories that yolo11 expects
!mkdir /content/datasets/train/
!mkdir /content/datasets/train/images/
!mkdir /content/datasets/train/labels/
!mkdir /content/datasets/test/
!mkdir /content/datasets/test/images/
!mkdir /content/datasets/test/labels/
!mkdir /content/datasets/val/
!mkdir /content/datasets/val/images/
!mkdir /content/datasets/val/labels/

#get the data.yaml file
!cp "/content/drive/MyDrive/uavs/data.yaml" "/content/datasets/data.yaml"
!ls /content/datasets/

#move the data to the expected directories
!cp -r "/content/datasets/savedir/images/train/" "/content/datasets/train/images/"
!cp -r "/content/datasets/savedir/labels/train/" "/content/datasets/train/labels/"

!cp -r "/content/datasets/savedir/images/test/" "/content/datasets/test/images/"
!cp -r "/content/datasets/savedir/labels/test/" "/content/datasets/test/labels/"

!cp -r "/content/datasets/savedir/images/val/" "/content/datasets/val/images/"
!cp -r "/content/datasets/savedir/labels/val/" "/content/datasets/val/labels/"

!ls /content/datasets/
Run the training. In this case, we start from pre-trained COCO model weights

!yolo task=detect mode=train model=yolo11x.pt data=/content/datasets/data.yaml epochs=40 patience=5 imgsz=640 plots=True

Here are examles of yolo11x training commands, with different starting weights

# Build a new model from YAML and start training from scratch
yolo detect train data=coco8.yaml model=yolo11x.yaml epochs=100 imgsz=640

# Start training from a pretrained *.pt model
yolo detect train data=coco8.yaml model=yolo11x.pt epochs=100 imgsz=640

# Build a new model from YAML, transfer pretrained weights to it and start training
yolo detect train data=coco8.yaml model=yolo11x.yaml pretrained=yolo11x.pt epochs=100 imgsz=640

Save the results of training

!cp "/content/runs/detect/train/weights/best.pt" "/content/drive/MyDrive/uavs/best.pt"
!cp "/content/runs/detect/train/weights/last.pt" "/content/drive/MyDrive/uavs/last.pt"
!cp -r "/content/runs/detect/train/" "/content/drive/MyDrive/uavs/train/"
NOTE: The results of the completed training are saved in {HOME}/runs/detect/train/. Let's examine them.

!ls {HOME}/runs/detect/train/
from IPython.display import Image as IPyImage
IPyImage(filename=f'{HOME}/runs/detect/train/confusion_matrix.png', width=600)
IPyImage(filename=f'{HOME}/runs/detect/train/results.png', width=600)
IPyImage(filename=f'{HOME}/runs/detect/train/val_batch0_pred.jpg', width=600)
Validate the trained model

!yolo task=detect mode=val model=/content/runs/detect/train/weights/best.pt data=/content/datasets/data.yaml
Inference the test set with the trained model

!yolo task=detect mode=predict model=/content/runs/detect/train/weights/best.pt conf=0.25 source=/content/datasets/test/images/test/ save=True

🗓️ Updated: 2025-10-01