\n",
" \n",
" * Distributed under the terms of the GPL License\n",
" * Maintainer: ryjo@mbari.org\n",
" * Authors: John Ryan ryjo@mbari.org"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tQ9nFWaYKV3M"
},
"source": [
"## Shipping Noise Analysis\n",
"---\n",
"This tutorial describes use of the *Pacific Ocean Sound Recordings* archive to examine variation in a major source of noise in the ocean: shipping. Because the lower frequencies of shipping noise travel farther and are thus more detectable regionally, we can use audio data with a relatively low sample rate to examine it. For efficiency, this tutorial uses the 2 kHz data archive (decimated from the original 256 kHz recordings). The derived metric is mean-square sound pressure spectral density in the one-third octave band centered at 63 Hz, an [international standard](https://publications.jrc.ec.europa.eu/repository/handle/JRC88045). This tutorial covers reading and processing of the audio data to produce the metric, and examination of time-series results. It is based on the methods of a [recent publication](https://www.frontiersin.org/articles/10.3389/fmars.2021.656566/full) that showed reduction of shipping noise during onset of the COVID-19 pandemic.\n",
"\n",
"If you use this data set, please **[cite our project](https://ieeexplore.ieee.org/document/7761363).**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Du9ECq2_KV3N"
},
"source": [
"## Data Overview\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HmKEsMDFKV3N"
},
"source": [
"### Recording site\n",
"The [recording site](https://www.mbari.org/at-sea/cabled-observatory/) is located on the continental slope of the eastern North Pacific, within [Monterey Bay National Marine Sanctuary](https://montereybay.noaa.gov/). This site is not near any major ports, thus it does not experience extremely high levels of shipping noise. However, offshore shipping lanes pass within approximately 20 km of the recorder, and the noise of shipping is a significant part of the local soundscape."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EiJxomFmKV3N"
},
"source": [
"### Hydrophone calibration\n",
"For the low-frequency (2 kHz) data, calibration data are not frequency dependent; a single low-frequency calibration value is used. Its value depends on time of data collection, as two hydrophones have been deployed sequentially at the same site. Before 14 June 2017, the calibration value is -168.8 dB re V / uPa (measured at 26 Hz). After this date the value is -177.9 dB re V / uPa (measured at 250 Hz).\n",
"\n",
"Further calibration details can for the first calibration deployement and\n",
"* https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment01.json\n",
"* https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment02.json\n",
"\n",
"There is an important distinction between the deployed hydrophones, relevant to time-series analysis. The first hydrophone exhibited calibration drift, while the second (deployed 13 June 2017 and currently operational) has not. This observation is consistent with differences in the technologies of the two instruments. What it means for this application is that time-series analysis of shipping noise should be constrained to the period of the second hydrophone deployment."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E9melRyQKV3N"
},
"source": [
"### Data files and archive organization\n",
"The decimated audio data are in daily [WAV](https://en.wikipedia.org/wiki/WAV) files in an s3 bucket named pacific-sound-2khz, grouped by year and month. Buckets are stored as objects, so the data are not physically stored in folders or directories as you may be famaliar with, but you can think of it conceptually as follows:\n",
"\n",
"```\n",
"pacific-sound-2khz\n",
" |\n",
" ----2020\n",
" |\n",
" |----01\n",
" ...\n",
" |----12\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DYgTu-wdKV3O"
},
"source": [
"## Install required dependencies\n",
"\n",
"First, let's install the required software dependencies. \n",
"\n",
"If you are using this notebook in a cloud environment, select a Python3 compatible kernel and run this next section. This only needs to be done once for the duration of this notebook.\n",
"\n",
"If you are working on local computer, you can skip this next cell. Change your kernel to *pacific-sound-notebooks*, which you installed according to the instructions in the [README](https://github.com/mbari-org/pacific-sound-notebooks/) - this has all the dependencies that are needed. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "UlBQqGBIKV3O",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "7ff569c0-f4f5-47fc-dcc2-e9d74402c736"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\u001b[K |████████████████████████████████| 132 kB 14.3 MB/s \n",
"\u001b[K |████████████████████████████████| 79 kB 7.2 MB/s \n",
"\u001b[K |████████████████████████████████| 9.2 MB 78.3 MB/s \n",
"\u001b[K |████████████████████████████████| 140 kB 74.8 MB/s \n",
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"requests 2.23.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you have urllib3 1.26.12 which is incompatible.\u001b[0m\n",
"\u001b[?25h"
]
}
],
"source": [
"!pip install -q boto3 --quiet\n",
"!pip install -q soundfile --quiet\n",
"!pip install -q scipy --quiet\n",
"!pip install -q numpy --quiet\n",
"!pip install -q matplotlib --quiet"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QK5rJMicKV3P"
},
"source": [
"### Import all packages"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "c5lbzI1aKV3P"
},
"outputs": [],
"source": [
"import boto3, botocore\n",
"from botocore import UNSIGNED\n",
"from botocore.client import Config\n",
"from six.moves.urllib.request import urlopen\n",
"import io\n",
"import scipy\n",
"from scipy import signal\n",
"import numpy as np\n",
"import soundfile as sf\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gXeXx0SQKV3P"
},
"source": [
"## Data Access\n",
"---\n",
"This section covers file listing, metadata retrieval, and data loading."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6fmp2uvbKV3P"
},
"source": [
"### List files\n",
"Files are organized by year and month; list all of the files available for one month of one year."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "pfVMQu3FKV3Q"
},
"outputs": [],
"source": [
"s3 = boto3.client('s3',\n",
" aws_access_key_id='',\n",
" aws_secret_access_key='', \n",
" config=Config(signature_version=UNSIGNED))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "XVsLXrkXKV3Q",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "386d8a58-8714-49b4-d596-ae3987f45c4e"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"2021/05/MARS-20210501T000000Z-2kHz.wav\n",
"2021/05/MARS-20210502T000000Z-2kHz.wav\n",
"2021/05/MARS-20210503T000000Z-2kHz.wav\n",
"2021/05/MARS-20210504T000000Z-2kHz.wav\n",
"2021/05/MARS-20210505T000000Z-2kHz.wav\n",
"2021/05/MARS-20210506T000000Z-2kHz.wav\n",
"2021/05/MARS-20210507T000000Z-2kHz.wav\n",
"2021/05/MARS-20210508T000000Z-2kHz.wav\n",
"2021/05/MARS-20210509T000000Z-2kHz.wav\n",
"2021/05/MARS-20210510T000000Z-2kHz.wav\n",
"2021/05/MARS-20210511T000000Z-2kHz.wav\n",
"2021/05/MARS-20210512T000000Z-2kHz.wav\n",
"2021/05/MARS-20210513T000000Z-2kHz.wav\n",
"2021/05/MARS-20210514T000000Z-2kHz.wav\n",
"2021/05/MARS-20210515T000000Z-2kHz.wav\n",
"2021/05/MARS-20210516T000000Z-2kHz.wav\n",
"2021/05/MARS-20210517T000000Z-2kHz.wav\n",
"2021/05/MARS-20210518T000000Z-2kHz.wav\n",
"2021/05/MARS-20210519T000000Z-2kHz.wav\n",
"2021/05/MARS-20210520T000000Z-2kHz.wav\n",
"2021/05/MARS-20210521T000000Z-2kHz.wav\n",
"2021/05/MARS-20210522T000000Z-2kHz.wav\n",
"2021/05/MARS-20210523T000000Z-2kHz.wav\n",
"2021/05/MARS-20210524T000000Z-2kHz.wav\n",
"2021/05/MARS-20210525T000000Z-2kHz.wav\n",
"2021/05/MARS-20210526T000000Z-2kHz.wav\n",
"2021/05/MARS-20210527T000000Z-2kHz.wav\n",
"2021/05/MARS-20210528T000000Z-2kHz.wav\n",
"2021/05/MARS-20210529T000000Z-2kHz.wav\n",
"2021/05/MARS-20210530T000000Z-2kHz.wav\n",
"2021/05/MARS-20210531T000000Z-2kHz.wav\n"
]
}
],
"source": [
"year = \"2021\"\n",
"month = \"05\"\n",
"bucket = 'pacific-sound-2khz'\n",
"\n",
"for obj in s3.list_objects_v2(Bucket=bucket, Prefix=f'{year}/{month}')['Contents']:\n",
" print(obj['Key'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0NcJc742KV3Q"
},
"source": [
"### Retrieve metadata\n",
"Read and show metadata for a single daily file."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "asc-cKOUKV3Q",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "93910bde-babb-4dbf-a40c-205ab7ce8161"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<_io.BytesIO object at 0x7efcb875c530>\n",
"samplerate: 2000 Hz\n",
"channels: 1\n",
"duration: 222 samples\n",
"format: WAV (Microsoft) [WAV]\n",
"subtype: Signed 24 bit PCM [PCM_24]\n",
"endian: FILE\n",
"sections: 1\n",
"frames: 222\n",
"extra_info: \"\"\"\n",
" Length : 1000\n",
" RIFF : 518400324 (should be 992)\n",
" WAVE\n",
" fmt : 16\n",
" Format : 0x1 => WAVE_FORMAT_PCM\n",
" Channels : 1\n",
" Sample Rate : 2000\n",
" Block Align : 3\n",
" Bit Width : 24\n",
" Bytes/sec : 6000\n",
" LIST : 280\n",
" INFO\n",
" INAM : MBARI ocean audio data, start 20210521T000000 UTC\n",
" ICMT : If you use these data, please cite https://doi.org/10.1109/OCEANS.2016.7761363. Recording metadata can be found at https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment02.json.\n",
" data : 518400000 (should be 668)\n",
" End\n",
" \"\"\""
]
},
"metadata": {},
"execution_count": 5
}
],
"source": [
"year = \"2021\"\n",
"month = \"05\"\n",
"filename = 'MARS-20210521T000000Z-2kHz.wav'\n",
"bucket = 'pacific-sound-2khz'\n",
"key = f'{year}/{month}/{filename}'\n",
"\n",
"url = f'https://{bucket}.s3.amazonaws.com/{key}'\n",
"\n",
"sf.info(io.BytesIO(urlopen(url).read(1_000)), verbose=True) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aE_bLXzwKV3Q"
},
"source": [
"### Load data\n",
"Read the full content of a single daily file."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "UDExnZjnKV3R",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c465b7ed-13bd-4a5f-faf3-8bec71ac2de7"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Reading from https://pacific-sound-2khz.s3.amazonaws.com/2021/05/MARS-20210521T000000Z-2kHz.wav\n",
"Read 86400.0 seconds of data\n"
]
}
],
"source": [
"# read full-day of data\n",
"print(f'Reading from {url}')\n",
"v, sample_rate = sf.read(io.BytesIO(urlopen(url).read()),dtype='float32') \n",
"v = v*3 # convert scaled voltage to volts\n",
"num_secs = (v.size) / sample_rate # number of seconds in vector\n",
"print(f'Read {num_secs} seconds of data')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qW7L1iJeKV3R"
},
"source": [
"## Shipping noise metric\n",
"---\n",
"Our focal metric is the mean-square sound pressure spectral density, ISO 18405 3.1.3.13, for the one-third octave band centered at 63 Hz (ISO, 2017; Dekeling et al., 2014). To compute this metric effectively, we must distinguish shipping noise from other signals in the same frequency band. At this recording site, we have found that other signals (harmonics of blue whale vocalizations, mechanical disturbance of the recorder) can influence the metric (see examples in Figure 3 of [this paper](https://www.frontiersin.org/articles/10.3389/fmars.2021.656566/full)). However, they are more ephemeral than shipping noise and their influence can be minimized simply by computing median values of the metric within a sufficiently long temporal bin. A temporal bin of 1 minute is applied here to the focal metric computed at 1-second resolution. Because the audio data have a DC offset and zeros are used to identify missing data, we also want to identify time periods of missing data using the percent of each minute having zero values. This is important for screening values then enter into time-binned statistics."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8YV7Lk8gKV3R"
},
"source": [
"### Compute spectrogram\n",
"The first step is to compute a spectrogram from non-overlapping 1-second segments. A frequency resoltion of 1 Hz effectively enables averaging of power spectral density over the focal frequency band. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "1YueNJ7oKV3R",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "231fe2d6-22fd-4e8a-afba-e2950b45f447"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
":: psd.shape = (1001, 86400)\n",
":: f.size = 1001\n",
":: t.size = 86400\n"
]
}
],
"source": [
"# Compute spectrogram \n",
"w = scipy.signal.get_window('hann',sample_rate)\n",
"f, t, psd = scipy.signal.spectrogram(v, sample_rate,nperseg=sample_rate,noverlap=0,window=w,nfft=sample_rate)\n",
"print(f':: psd.shape = {psd.shape}')\n",
"print(f':: f.size = {f.size}')\n",
"print(f':: t.size = {t.size}')\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EtLWjY32KV3R"
},
"source": [
"### Extract focal frequency band\n",
"The next step is to average power spectral density for the one-third octave band centered at 63 Hz. Band limits are rounded to the nearst Hz."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "BRELmdK3KV3R",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "1301e3d9-bc7e-4524-e4a5-0235662ec287"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
":: selected frequencies are [56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.]\n",
":: psd_sub.shape = (16, 86400)\n",
":: band_psd.size = 86400\n",
":: min of band psd = 72.78706359863281\n",
":: max of band pad = 120.67597961425781\n"
]
}
],
"source": [
"idx = np.where(np.logical_and(f>=56, f<=71))\n",
"print(f':: selected frequencies are {f[idx]}')\n",
"psd_sub = np.squeeze(psd[idx,:])\n",
"print(f':: psd_sub.shape = {psd_sub.shape}')\n",
"# band psd\n",
"band_psd = np.mean(psd_sub, axis=0)\n",
"# Convert to dB\n",
"sens = -177.9 # hydrophone sensitivity for this file\n",
"band_psd = 10*np.log10(band_psd)-sens\n",
"print(f':: band_psd.size = {band_psd.size}')\n",
"print(f':: min of band psd = {min(band_psd)}')\n",
"print(f':: max of band pad = {max(band_psd)}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_w-bbZeVKV3S"
},
"source": [
"### Compute 1-minute median (L50)\n",
"Finally, we compute median spectrum levels at 1 minute resolution.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "OSlKoxjgKV3S",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 482
},
"outputId": "00807d87-f18f-48d0-fe41-5d07b0ea0044"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
":: band_psd.size = 86400\n",
":: L.shape = (1440, 60)\n",
":: L50.size = 1440\n",
":: min(L50) = 78.15385437011719\n",
":: max(L50) = 91.43157958984375\n",
":: L50sec.size = 1440\n",
":: min(L50sec) = 30.0\n",
":: max(L50sec) = 86370.0\n",
":: samples per minute = 120000\n",
":: zero count array shape = (1440, 120000)\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Zero values per minute in original data')"
]
},
"metadata": {},
"execution_count": 9
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"