\n",
"\n",
" * Distributed under the terms of the GPL License\n",
" * Maintainer: ryjo@mbari.org\n",
" * Author: John Ryan ryjo@mbari.org"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oufZYHdskBWn"
},
"source": [
"## Blue whale song\n",
"---\n",
"Baleen whales produce rhythmic repeated sequences of sound; they sing. This tutorial describes use of the *Pacific Ocean Sound Recordings* archive to examine temporal patterns of occurrence of blue whale song. Signal processing methods focus on the blue whale B call. A companion tutorial illustrates detection and classification of blue whale A calls using machine learning.\n",
"\n",
"If you use this data set, please **[cite our project](https://ieeexplore.ieee.org/document/7761363).**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9HiGo0WNkBWn"
},
"source": [
"## Data Overview\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tn2mT9DEkBWn"
},
"source": [
"### Recording site\n",
"The [recording site](https://www.mbari.org/at-sea/cabled-observatory/) is located on the continental slope of the eastern North Pacific, within [Monterey Bay National Marine Sanctuary](https://montereybay.noaa.gov/). The region is known to be [important foraging habitat](https://www.cascadiaresearch.org/publications/biologically-important-areas-selected-cetaceans-within-us-waters-%E2%80%93-west-coast-region) for the regional blue whale population."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dk15J9HEkBWn"
},
"source": [
"### Hydrophone calibration\n",
"For the low-frequency (2 kHz) data, calibration data are not frequency dependent; a single low-frequency calibration value is used. Its value depends on time of data collection, as two hydrophones have been deployed sequentially at the same site. Before 14 June 2017, the calibration value is -168.8 dB re V / uPa (measured at 26 Hz). After this date the value is -177.9 dB re V / uPa (measured at 250 Hz). See also:\n",
"\n",
"\n",
"* https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment01.json\n",
"* https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment02.json\n",
"\n",
"The first hydrophone exhibited calibration drift, while the second (deployed 13 June 2017 and currently operational) has not. This observation is consistent with differences in the technologies of the two instruments. However, for this application the calibration drift of the first hydrophone is not problematic because the CI is computed as a signal to noise ratio. Therefore, time-series analysis of CI can reliably span the full archive."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Qxt8sRQWkBWo"
},
"source": [
"### Data files and archive organization\n",
"The decimated audio data are in daily [WAV](https://en.wikipedia.org/wiki/WAV) files in an s3 bucket named pacific-sound-2khz, grouped by year and month. Buckets are stored as objects, so the data are not physically stored in folders or directories as you may be famaliar with, but you can think of it conceptually as follows:\n",
"\n",
"```\n",
"pacific-sound-2khz\n",
" |\n",
" ----2020\n",
" |\n",
" |----01\n",
" ...\n",
" |----12\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0gCxAK9NkBWo"
},
"source": [
"## Install required dependencies\n",
"\n",
"First, let's install the required software dependencies. \n",
"\n",
"If you are using this notebook in a cloud environment, select a Python3 compatible kernel and run this next section. This only needs to be done once for the duration of this notebook.\n",
"\n",
"If you are working on local computer, you can skip this next cell. Change your kernel to *pacific-sound-notebooks*, which you installed according to the instructions in the [README](https://github.com/mbari-org/pacific-sound-notebooks/) - this has all the dependencies that are needed. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "PdgRR34ykBWp",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b9286bac-89f1-4a79-c3f0-3f32ce8f1b31"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\u001b[?25l\r\u001b[K |██▌ | 10 kB 26.1 MB/s eta 0:00:01\r\u001b[K |█████ | 20 kB 5.2 MB/s eta 0:00:01\r\u001b[K |███████▍ | 30 kB 7.4 MB/s eta 0:00:01\r\u001b[K |██████████ | 40 kB 3.2 MB/s eta 0:00:01\r\u001b[K |████████████▍ | 51 kB 3.6 MB/s eta 0:00:01\r\u001b[K |██████████████▉ | 61 kB 4.2 MB/s eta 0:00:01\r\u001b[K |█████████████████▎ | 71 kB 4.2 MB/s eta 0:00:01\r\u001b[K |███████████████████▉ | 81 kB 4.5 MB/s eta 0:00:01\r\u001b[K |██████████████████████▎ | 92 kB 5.0 MB/s eta 0:00:01\r\u001b[K |████████████████████████▊ | 102 kB 3.9 MB/s eta 0:00:01\r\u001b[K |███████████████████████████▏ | 112 kB 3.9 MB/s eta 0:00:01\r\u001b[K |█████████████████████████████▊ | 122 kB 3.9 MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 132 kB 3.9 MB/s \n",
"\u001b[K |████████████████████████████████| 79 kB 7.8 MB/s \n",
"\u001b[K |████████████████████████████████| 9.1 MB 35.6 MB/s \n",
"\u001b[K |████████████████████████████████| 140 kB 49.8 MB/s \n",
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"requests 2.23.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you have urllib3 1.26.12 which is incompatible.\u001b[0m\n",
"\u001b[?25h"
]
}
],
"source": [
"!pip install -q boto3 --quiet\n",
"!pip install -q soundfile --quiet\n",
"!pip install -q scipy --quiet\n",
"!pip install -q numpy --quiet\n",
"!pip install -q matplotlib --quiet"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cnvdJE7GkBWp"
},
"source": [
"### Import all packages"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "RXuZEXTvkBWq"
},
"outputs": [],
"source": [
"import boto3, botocore\n",
"from botocore import UNSIGNED\n",
"from botocore.client import Config\n",
"from six.moves.urllib.request import urlopen\n",
"import io\n",
"import scipy\n",
"from scipy import signal\n",
"import numpy as np\n",
"import soundfile as sf\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ncMMqR0wkBWq"
},
"source": [
"## Data Access\n",
"---\n",
"This section covers file listing, metadata retrieval, and data loading."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z6NHGnrmkBWq"
},
"source": [
"### List files\n",
"Files are organized by year and month; list all of the files available for one month of one year."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "CR8tNSNCkBWq"
},
"outputs": [],
"source": [
"s3 = boto3.client('s3',\n",
" aws_access_key_id='',\n",
" aws_secret_access_key='',\n",
" config=Config(signature_version=UNSIGNED))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "WVuPmzvskBWq",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a0180728-9a2b-49b7-e888-9e992a63d206"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"2016/11/MARS-20161101T000000Z-2kHz.wav\n",
"2016/11/MARS-20161102T000000Z-2kHz.wav\n",
"2016/11/MARS-20161103T000000Z-2kHz.wav\n",
"2016/11/MARS-20161104T000000Z-2kHz.wav\n",
"2016/11/MARS-20161105T000000Z-2kHz.wav\n",
"2016/11/MARS-20161106T000000Z-2kHz.wav\n",
"2016/11/MARS-20161107T000000Z-2kHz.wav\n",
"2016/11/MARS-20161108T000000Z-2kHz.wav\n",
"2016/11/MARS-20161109T000000Z-2kHz.wav\n",
"2016/11/MARS-20161110T000000Z-2kHz.wav\n",
"2016/11/MARS-20161111T000000Z-2kHz.wav\n",
"2016/11/MARS-20161112T000000Z-2kHz.wav\n",
"2016/11/MARS-20161113T000000Z-2kHz.wav\n",
"2016/11/MARS-20161114T000000Z-2kHz.wav\n",
"2016/11/MARS-20161115T000000Z-2kHz.wav\n",
"2016/11/MARS-20161116T000000Z-2kHz.wav\n",
"2016/11/MARS-20161117T000000Z-2kHz.wav\n",
"2016/11/MARS-20161118T000000Z-2kHz.wav\n",
"2016/11/MARS-20161119T000000Z-2kHz.wav\n",
"2016/11/MARS-20161120T000000Z-2kHz.wav\n",
"2016/11/MARS-20161121T000000Z-2kHz.wav\n",
"2016/11/MARS-20161122T000000Z-2kHz.wav\n",
"2016/11/MARS-20161123T000000Z-2kHz.wav\n",
"2016/11/MARS-20161124T000000Z-2kHz.wav\n",
"2016/11/MARS-20161125T000000Z-2kHz.wav\n",
"2016/11/MARS-20161126T000000Z-2kHz.wav\n",
"2016/11/MARS-20161127T000000Z-2kHz.wav\n",
"2016/11/MARS-20161128T000000Z-2kHz.wav\n",
"2016/11/MARS-20161129T000000Z-2kHz.wav\n",
"2016/11/MARS-20161130T000000Z-2kHz.wav\n"
]
}
],
"source": [
"year = 2016\n",
"month = 11\n",
"bucket = 'pacific-sound-2khz'\n",
"\n",
"for obj in s3.list_objects_v2(Bucket=bucket, Prefix=f'{year:04d}/{month:02d}')['Contents']:\n",
" print(obj['Key'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "t9tfOzx1kBWr"
},
"source": [
"### Retrieve metadata\n",
"Read and show metadata for a single daily file."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "fUiQcjgNkBWr",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "2ad08ae0-7d40-44be-dac7-7a63bcf231f4"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<_io.BytesIO object at 0x7fe27014b050>\n",
"samplerate: 2000 Hz\n",
"channels: 1\n",
"duration: 222 samples\n",
"format: WAV (Microsoft) [WAV]\n",
"subtype: Signed 24 bit PCM [PCM_24]\n",
"endian: FILE\n",
"sections: 1\n",
"frames: 222\n",
"extra_info: \"\"\"\n",
" Length : 1000\n",
" RIFF : 518400324 (should be 992)\n",
" WAVE\n",
" fmt : 16\n",
" Format : 0x1 => WAVE_FORMAT_PCM\n",
" Channels : 1\n",
" Sample Rate : 2000\n",
" Block Align : 3\n",
" Bit Width : 24\n",
" Bytes/sec : 6000\n",
" LIST : 280\n",
" INFO\n",
" INAM : MBARI ocean audio data, start 20161101T000000 UTC\n",
" ICMT : If you use these data, please cite https://doi.org/10.1109/OCEANS.2016.7761363. Recording metadata can be found at https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment01.json.\n",
" data : 518400000 (should be 668)\n",
" End\n",
" \"\"\""
]
},
"metadata": {},
"execution_count": 5
}
],
"source": [
"year = 2016\n",
"month = 11\n",
"filename = 'MARS-20161101T000000Z-2kHz.wav'\n",
"bucket = 'pacific-sound-2khz'\n",
"key = f'{year:04d}/{month:02d}/{filename}'\n",
"\n",
"url = f'https://{bucket}.s3.amazonaws.com/{key}'\n",
"\n",
"sf.info(io.BytesIO(urlopen(url).read(1_000)), verbose=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pfoQaAtAkBWr"
},
"source": [
"### Load data\n",
"Read a single daily file."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "3_55ErkDkBWr",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "cfe3c6b8-ba1d-4354-833d-6c816164d19c"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Reading from https://pacific-sound-2khz.s3.amazonaws.com/2016/11/MARS-20161101T000000Z-2kHz.wav\n",
"Read 86400.0 seconds of data\n"
]
}
],
"source": [
"# read full-day of data\n",
"print(f'Reading from {url}')\n",
"v, sample_rate = sf.read(io.BytesIO(urlopen(url).read()),dtype='float32')\n",
"v = v*3 # convert scaled voltage to volts\n",
"nsec = (v.size)/sample_rate # number of seconds in vector\n",
"print(f'Read {nsec} seconds of data')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IP3JXXk0kBWr"
},
"source": [
"## A view of blue whale song\n",
"---\n",
"To understand the method of quantifying song occurrence using an energy metric, it is useful to first consider the attributes of blue whale song. Songs of the northeast Pacific blue whale population include three types of calls: A, B, and C. The B calls have the strongest intensity and are thus often used to characterize song occurrence.\n",
"\n",
"Analysis approaches include (1) detecting, classifying, and counting calls, and (2) quantifying the energy within the frequency band of the call, relative to that at background frequencies. The first approach becomes difficult during periods when the whales chorus because the presence of overlapping calls thwarts distinction of individual calls. The second approach can be applied consistently regardless of whether or not vocalizations overlap. Application of this second approach to years of recordings, together with animal-borne metrics, revealed an [acoustic signature of blue whale migration](https://www.sciencedirect.com/science/article/pii/S0960982220313312).\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "YfTAc5BGkBWr",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 686
},
"outputId": "f2c1b677-0c49-4eae-938f-5249ff1e527a"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
":: psd.shape = (1001, 86400)\n",
":: f.size = 1001\n",
":: t.size = 86400\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log10\n",
" \"\"\"\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Text(1300, 20, 'fin whale calls')"
]
},
"metadata": {},
"execution_count": 7
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"