Skip to content

Models

Pacific Sound Models

Classification Models

Model size
(pixels)
F1 accuracy val example true example false data used
blueA 1 224 0.9454 0.9454  Image link  Image link s3://pacific-sound-2khz
blueD 2 224 0.9374 0.9391  Image link  Image link s3://pacific-sound-2khz

Using the classification models

Directory structure

Models are stored in TensorFlow saved model format and bundled with a config.json file. After downloading and decompressing the compressed tar file, you should see a structure similar to the following:

│   └── 1
│       ...
│       ├── saved_model.pb
│       ├── config.json

Config.json file

The config.json file captures necessary data for normalizing spectrogram images before running the model on your data. It also contains the name of the classes.

For example, for the blue whale A model, here we see the image mean, standard deviation, and class names :

{
    "image_mean":[
        0.18429388105869293,
        0.6595855951309204,
        0.6857580542564392
    ],
    "image_size":"224x224",
    "image_std":[
        0.02958579920232296,
        0.018393859267234802,
        0.014677613973617554
    ],
    "classes":[
        "baf",
        "bat"
    ]
}

Image Normalization

Image mean and standard deviation are stored in RGB order. These should be used to normalize spectrograms before using with inference/prediction with the model. For example, in Python this could be used with:

from PIL import Image
import numpy as np
import json

config = json.load(open('1/config.json'))
image_mean = np.asarray(config["image_mean"])
image_std = np.asarray(config["image_std"])
image_path =  '20171101T051942.38365387.38401761.sel.456.ch01.spectrogram.jpg'
image = Image.open(image_path).convert('RGB')
# normalize with the same parameters used in training
image_float = np.asarray(image).astype('float32')
image_float = image_float / 255.
image_float = (image_float - image_mean) / ( image_std + 1.e-9)

# EfficientNet models expect their inputs to be float tensors of pixels with values in the [0-255] range.
# which can be done with:
# image_int = (image_float*255).astype(int)

Class names

Class names are stored in the sorted order of training, so here index 0 is baf, or false A calls, and index 1 is bat, or true A calls.

{
    ...
    "classes":[
        "baf",
        "bat"
    ]
}

*1 Trained using 2173 true positive detections and 2223 false positive detections from various years and seasons of the pacific-sound repository 2 kHz data. Validated with 1530 false and 752 true detections.

*2 Trained using 1299 true positive detections and 2855 false positive detections from various years and seasons of the pacific-sound repository 2 kHz data. Validated with 1392 false and 1075 true detections.