Download this notebook by clicking on the download icon
or find it in the repository
Blue B Call Index
- Distributed under the terms of the GPL License
- Maintainer: ryjo@mbari.org
- Author: John Ryan ryjo@mbari.org
Blue whale song¶
Baleen whales produce rhythmic repeated sequences of sound; they sing. This tutorial describes use of the Pacific Ocean Sound Recordings archive to examine temporal patterns of occurrence of blue whale song. Signal processing methods focus on the blue whale B call. A companion tutorial illustrates detection and classification of blue whale A calls using machine learning.
If you use this data set, please cite our project.
Data Overview¶
Recording site¶
The recording site is located on the continental slope of the eastern North Pacific, within Monterey Bay National Marine Sanctuary. The region is known to be important foraging habitat for the regional blue whale population.
Hydrophone calibration¶
For the low-frequency (2 kHz) data, calibration data are not frequency dependent; a single low-frequency calibration value is used. Its value depends on time of data collection, as two hydrophones have been deployed sequentially at the same site. Before 14 June 2017, the calibration value is -168.8 dB re V / uPa (measured at 26 Hz). After this date the value is -177.9 dB re V / uPa (measured at 250 Hz). See also:
- https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment01.json
- https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment02.json
The first hydrophone exhibited calibration drift, while the second (deployed 13 June 2017 and currently operational) has not. This observation is consistent with differences in the technologies of the two instruments. However, for this application the calibration drift of the first hydrophone is not problematic because the CI is computed as a signal to noise ratio. Therefore, time-series analysis of CI can reliably span the full archive.
Data files and archive organization¶
The decimated audio data are in daily WAV files in an s3 bucket named pacific-sound-2khz, grouped by year and month. Buckets are stored as objects, so the data are not physically stored in folders or directories as you may be famaliar with, but you can think of it conceptually as follows:
pacific-sound-2khz
|
----2020
|
|----01
...
|----12
Install required dependencies¶
First, let's install the required software dependencies.
If you are using this notebook in a cloud environment, select a Python3 compatible kernel and run this next section. This only needs to be done once for the duration of this notebook.
If you are working on local computer, you can skip this next cell. Change your kernel to pacific-sound-notebooks, which you installed according to the instructions in the README - this has all the dependencies that are needed.
!pip install -q boto3 --quiet
!pip install -q soundfile --quiet
!pip install -q scipy --quiet
!pip install -q numpy --quiet
!pip install -q matplotlib --quiet
|████████████████████████████████| 132 kB 3.9 MB/s
|████████████████████████████████| 79 kB 7.8 MB/s
|████████████████████████████████| 9.1 MB 35.6 MB/s
|████████████████████████████████| 140 kB 49.8 MB/s
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
requests 2.23.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you have urllib3 1.26.12 which is incompatible.
Import all packages¶
import boto3, botocore
from botocore import UNSIGNED
from botocore.client import Config
from six.moves.urllib.request import urlopen
import io
import scipy
from scipy import signal
import numpy as np
import soundfile as sf
import matplotlib.pyplot as plt
List files¶
Files are organized by year and month; list all of the files available for one month of one year.
s3 = boto3.client('s3',
aws_access_key_id='',
aws_secret_access_key='',
config=Config(signature_version=UNSIGNED))
year = 2016
month = 11
bucket = 'pacific-sound-2khz'
for obj in s3.list_objects_v2(Bucket=bucket, Prefix=f'{year:04d}/{month:02d}')['Contents']:
print(obj['Key'])
2016/11/MARS-20161101T000000Z-2kHz.wav 2016/11/MARS-20161102T000000Z-2kHz.wav 2016/11/MARS-20161103T000000Z-2kHz.wav 2016/11/MARS-20161104T000000Z-2kHz.wav 2016/11/MARS-20161105T000000Z-2kHz.wav 2016/11/MARS-20161106T000000Z-2kHz.wav 2016/11/MARS-20161107T000000Z-2kHz.wav 2016/11/MARS-20161108T000000Z-2kHz.wav 2016/11/MARS-20161109T000000Z-2kHz.wav 2016/11/MARS-20161110T000000Z-2kHz.wav 2016/11/MARS-20161111T000000Z-2kHz.wav 2016/11/MARS-20161112T000000Z-2kHz.wav 2016/11/MARS-20161113T000000Z-2kHz.wav 2016/11/MARS-20161114T000000Z-2kHz.wav 2016/11/MARS-20161115T000000Z-2kHz.wav 2016/11/MARS-20161116T000000Z-2kHz.wav 2016/11/MARS-20161117T000000Z-2kHz.wav 2016/11/MARS-20161118T000000Z-2kHz.wav 2016/11/MARS-20161119T000000Z-2kHz.wav 2016/11/MARS-20161120T000000Z-2kHz.wav 2016/11/MARS-20161121T000000Z-2kHz.wav 2016/11/MARS-20161122T000000Z-2kHz.wav 2016/11/MARS-20161123T000000Z-2kHz.wav 2016/11/MARS-20161124T000000Z-2kHz.wav 2016/11/MARS-20161125T000000Z-2kHz.wav 2016/11/MARS-20161126T000000Z-2kHz.wav 2016/11/MARS-20161127T000000Z-2kHz.wav 2016/11/MARS-20161128T000000Z-2kHz.wav 2016/11/MARS-20161129T000000Z-2kHz.wav 2016/11/MARS-20161130T000000Z-2kHz.wav
Retrieve metadata¶
Read and show metadata for a single daily file.
year = 2016
month = 11
filename = 'MARS-20161101T000000Z-2kHz.wav'
bucket = 'pacific-sound-2khz'
key = f'{year:04d}/{month:02d}/{filename}'
url = f'https://{bucket}.s3.amazonaws.com/{key}'
sf.info(io.BytesIO(urlopen(url).read(1_000)), verbose=True)
<_io.BytesIO object at 0x7fe27014b050> samplerate: 2000 Hz channels: 1 duration: 222 samples format: WAV (Microsoft) [WAV] subtype: Signed 24 bit PCM [PCM_24] endian: FILE sections: 1 frames: 222 extra_info: """ Length : 1000 RIFF : 518400324 (should be 992) WAVE fmt : 16 Format : 0x1 => WAVE_FORMAT_PCM Channels : 1 Sample Rate : 2000 Block Align : 3 Bit Width : 24 Bytes/sec : 6000 LIST : 280 INFO INAM : MBARI ocean audio data, start 20161101T000000 UTC ICMT : If you use these data, please cite https://doi.org/10.1109/OCEANS.2016.7761363. Recording metadata can be found at https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment01.json. data : 518400000 (should be 668) End """
Load data¶
Read a single daily file.
# read full-day of data
print(f'Reading from {url}')
v, sample_rate = sf.read(io.BytesIO(urlopen(url).read()),dtype='float32')
v = v*3 # convert scaled voltage to volts
nsec = (v.size)/sample_rate # number of seconds in vector
print(f'Read {nsec} seconds of data')
Reading from https://pacific-sound-2khz.s3.amazonaws.com/2016/11/MARS-20161101T000000Z-2kHz.wav Read 86400.0 seconds of data
A view of blue whale song¶
To understand the method of quantifying song occurrence using an energy metric, it is useful to first consider the attributes of blue whale song. Songs of the northeast Pacific blue whale population include three types of calls: A, B, and C. The B calls have the strongest intensity and are thus often used to characterize song occurrence.
Analysis approaches include (1) detecting, classifying, and counting calls, and (2) quantifying the energy within the frequency band of the call, relative to that at background frequencies. The first approach becomes difficult during periods when the whales chorus because the presence of overlapping calls thwarts distinction of individual calls. The second approach can be applied consistently regardless of whether or not vocalizations overlap. Application of this second approach to years of recordings, together with animal-borne metrics, revealed an acoustic signature of blue whale migration.
# Compute spectrogram
w = scipy.signal.get_window('hann',sample_rate)
f, t, psd = scipy.signal.spectrogram(v, sample_rate,nperseg=sample_rate,noverlap=0,window=w,nfft=sample_rate)
sens = -168.8 # hydrophone sensitivity at 26 Hz
psd = 10*np.log10(psd) - sens
print(f':: psd.shape = {psd.shape}')
print(f':: f.size = {f.size}')
print(f':: t.size = {t.size}')
# Subset 30 minutes
start_hour = 7
start_sec = int(start_hour * 3600 + 1)
end_sec = start_sec+1800-1
psd_subset = psd[:,start_sec:end_sec]
# Plot
plt.figure(dpi=200, figsize = [9,3])
plt.imshow(psd_subset,aspect='auto',origin='lower',vmin=45,vmax=95)
plt.plot([1, 1790],[39, 39],'w--')
plt.plot([1, 1790],[48, 48],'w--')
plt.colorbar()
plt.ylim(8,150)
plt.yscale('log')
plt.xlabel('Second of hour 07')
plt.ylabel('Frequency (Hz)')
plt.title('Spectrum level (dB re 1 $\mu$Pa$^2$/Hz)')
plt.annotate("C",(1100,10),color='w')
plt.annotate("B",(1100,13.5),color='w')
plt.annotate("B$_2$",(1100,27),color='w')
plt.annotate("B$_3$",(1100,41),color='w')
plt.annotate("B$_4$",(1100,55),color='w')
plt.annotate("A",(1100,78),color='w')
plt.annotate("blue whale calls",(950,110),color='w')
plt.annotate("fin whale calls",(1300,20),color='w')
:: psd.shape = (1001, 86400) :: f.size = 1001 :: t.size = 86400
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log10 """
Text(1300, 20, 'fin whale calls')