Download this notebook by clicking on the download icon
or find it in the repository
Blue A Classify
- Distributed under the terms of the GPL License
- Maintainer: dcline@mbari.org
- Authors: Danelle Cline dcline@mbari.org, John Ryan ryjo@mbari.org
Kernel Selection¶
If running in SageMaker, the Python 3 (Data Science) kernel is sufficient. The Python 3 (TensorFlow 2.3 Python 3.7 GPU Optimized) will run the inference code faster for a higher cost. For more advanced users, SageMaker Batch Transform is recommended to process data in bulk.
Applying Machine Learning to classify blue whale A calls¶
Essential to detection and classification of marine mammal vocalizations are the distinct acoustic attributes of those vocalizations. Machine learning (ML) is an effective way to recognize acoustic attributes and reliably classify such vocalizations.
In this brief tutorial, we will:
- tap into an extensive (6+ years and growing) archive of sound recordings from a deep-sea location along the eastern margin of the North Pacific Ocean,
- illustrate the beautiful songs produced by baleen whales, and
- demonstrate the application of ML to classify one of the three types of calls produced by blue whales in their songs.
If you use this data set or tutorial, please cite our project[1].
Install required dependencies¶
First, let's install the required software dependencies.
If you are using this notebook in a cloud environment, select a Python3 compatible kernel and run this next section. This only needs to be done once for the duration of this notebook.
If you are working on local computer, you can skip this next cell. Change your kernel to pacific-sound-notebooks, which you installed according to the instructions in the README - this has all the dependencies that are needed.
!apt-get update -y && apt-get install -y libsndfile1
!python -m pip install --upgrade pip
!pip install tensorflow==2.4.1 --quiet
!pip install boto3 --quiet
!pip install oceansoundscape --quiet
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease Hit:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release Hit:5 http://security.ubuntu.com/ubuntu bionic-security InRelease Hit:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease Hit:7 http://archive.ubuntu.com/ubuntu bionic InRelease Hit:9 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease Hit:10 http://archive.ubuntu.com/ubuntu bionic-updates InRelease Hit:11 http://archive.ubuntu.com/ubuntu bionic-backports InRelease Hit:12 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic InRelease Hit:13 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease Reading package lists... Done Reading package lists... Done Building dependency tree Reading state information... Done libsndfile1 is already the newest version (1.0.28-4ubuntu0.18.04.2). The following package was automatically installed and is no longer required: libnvidia-common-460 Use 'apt autoremove' to remove it. 0 upgraded, 0 newly installed, 0 to remove and 41 not upgraded. Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: pip in /usr/local/lib/python3.7/dist-packages (22.2.2) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Import all packages¶
import boto3
from botocore import UNSIGNED
from botocore.client import Config
import cv2
from oceansoundscape.spectrogram.signal import psd_1sec
from oceansoundscape.raven import BLEDParser
from oceansoundscape.spectrogram import conf, denoise, utils
import os
from pathlib import Path
import soundfile as sf
import json
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from matplotlib.patches import Rectangle
Why use 2 kHz data?¶
Because we are studying the low-frequency calls of blue whales, we don't need the original recordings sampled at 256 kHz. Instead, we will use the 2 kHz decimated data in WAV format; these are stored in an s3 bucket named pacific-sound-2khz. For more information on the storage and organization of the 2kHz data, please see the 2kHz example.
List the contents of a monthly directory¶
Between August 2015 and July 2021 (a 6-year period), the highest levels of blue whale song activity off central California were detected during November 2017. Let's start by listing the daily 2 kHz files for that month.
s3 = boto3.client('s3',
aws_access_key_id='',
aws_secret_access_key='',
config=Config(signature_version=UNSIGNED))
year = 2017
month = 11
bucket = 'pacific-sound-2khz'
for obj in s3.list_objects_v2(Bucket=bucket, Prefix=f'{year:04d}/{month:02d}')['Contents']:
print(obj['Key'])
2017/11/MARS-20171101T000000Z-2kHz.wav 2017/11/MARS-20171102T000000Z-2kHz.wav 2017/11/MARS-20171103T000000Z-2kHz.wav 2017/11/MARS-20171104T000000Z-2kHz.wav 2017/11/MARS-20171105T000000Z-2kHz.wav 2017/11/MARS-20171106T000000Z-2kHz.wav 2017/11/MARS-20171107T000000Z-2kHz.wav 2017/11/MARS-20171108T000000Z-2kHz.wav 2017/11/MARS-20171109T000000Z-2kHz.wav 2017/11/MARS-20171110T000000Z-2kHz.wav 2017/11/MARS-20171111T000000Z-2kHz.wav 2017/11/MARS-20171112T000000Z-2kHz.wav 2017/11/MARS-20171113T000000Z-2kHz.wav 2017/11/MARS-20171114T000000Z-2kHz.wav 2017/11/MARS-20171115T000000Z-2kHz.wav 2017/11/MARS-20171116T000000Z-2kHz.wav 2017/11/MARS-20171117T000000Z-2kHz.wav 2017/11/MARS-20171118T000000Z-2kHz.wav 2017/11/MARS-20171119T000000Z-2kHz.wav 2017/11/MARS-20171120T000000Z-2kHz.wav 2017/11/MARS-20171121T000000Z-2kHz.wav 2017/11/MARS-20171122T000000Z-2kHz.wav 2017/11/MARS-20171123T000000Z-2kHz.wav 2017/11/MARS-20171124T000000Z-2kHz.wav 2017/11/MARS-20171125T000000Z-2kHz.wav 2017/11/MARS-20171126T000000Z-2kHz.wav 2017/11/MARS-20171127T000000Z-2kHz.wav 2017/11/MARS-20171128T000000Z-2kHz.wav 2017/11/MARS-20171129T000000Z-2kHz.wav 2017/11/MARS-20171130T000000Z-2kHz.wav 2017/11/copy.sh
A view of baleen whale song¶
Let's produce a spectrogram with sufficient resolution in time and frequency to see the blue whale song with enough resolution to visually identify. We'll limit the exercise to a single hour from a day with calls of variable received intensity (signal strength).
Download a single 2 kHz file¶
year = 2017
month = 11
wav_filename = 'MARS-20171101T000000Z-2kHz.wav'
bucket = 'pacific-sound-2khz'
key = f'{year:04d}/{month:02d}/{wav_filename}'
s3 = boto3.resource('s3',
aws_access_key_id='',
aws_secret_access_key='',
config=Config(signature_version=UNSIGNED))
# only download if needed
if not Path(wav_filename).exists():
# Alternatively, it can be downloaded directly in SageMaker with
# !aws s3 cp s3://{bucket}/{key} .
print('Downloading')
s3.Bucket(bucket).download_file(key, wav_filename)
print('Done')
Subset to the 5th hour of the day¶
sample_rate = int(2e3)
start_hour = 5
start_frame = int(sample_rate * start_hour * 3600)
duration_frames = int(sample_rate* 3600)
pacsound_file = sf.SoundFile(wav_filename)
pacsound_file.seek(start_frame)
x = pacsound_file.read(duration_frames, dtype='float32')
Plot the full 2 kHz spectrogram¶
Lots of biophony (sounds of ocean life) are represented in this spectrogram. Humpback whale songs are dominant above ~ 100 Hz, while blue and fin whale songs are dominant below ~ 100 Hz. The energy of blue whale A calls is largely between ~70 and 90 Hz (between the white dashed lines).
For more details about creating a calibrated spectrogram, see the 2 kHz tutorial.
sg, f = psd_1sec(x, sample_rate, 177.9) # create calibrated psd
plt.figure(dpi=300)
plt.axhline(73,linestyle='--', color='white')
plt.axhline(91,linestyle='--', color='white')
plt.imshow(sg,extent=[0, 3600, min(f), max(f)],aspect='auto',origin='lower',vmin=30,vmax=100)
plt.yscale('log')
plt.ylim(10,1000)
plt.colorbar()
plt.xlabel('November 01, 2017 Hour 5')
plt.ylabel('Frequency (Hz)')
plt.title('Calibrated spectrum levels')
Text(0.5, 1.0, 'Calibrated spectrum levels')