\n",
" \n",
" * Distributed under the terms of the GPL License\n",
" * Maintainer: dcline@mbari.org\n",
" * Authors: Danelle Cline dcline@mbari.org, John Ryan ryjo@mbari.org"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DAonxBxzyfbo"
},
"source": [
"## Basic Exploration of the 2 kHz Pacific Ocean Audio Data in the AWS Open Data Registry\n",
"\n",
"---\n",
"An extensive (5+ years and growing) archive of sound recordings from a deep-sea location [along the eastern margin of the North Pacific Ocean](https://www.mbari.org/at-sea/cabled-observatory/) has been made available through AWS Open data. Temporal coverage of the recording archive has been 95% since project inception in July 2015. The original recordings have a sample rate of 256 kHz. For many research applications it is convenient to work with data having a lower sample rate. This notebook illustrates basic methods to access and process a calibrated spectrogram from the decimated 2 kHz audio archive.\n",
"\n",
"If you use this data set, please **[cite our project](https://ieeexplore.ieee.org/document/7761363).**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ORw-y2Cbyfbp"
},
"source": [
"## Data Overview\n",
"The decimated audio data are in [WAV](https://en.wikipedia.org/wiki/WAV) format in an s3 bucket named pacific-sound-2khz. They are further organized by year and month. Buckets are stored as objects, so the data isn't physically stored in folders or directories as you may be famaliar with, but you can think of it conceptually as follows:\n",
"\n",
"```\n",
"pacific-sound-2khz\n",
" |\n",
" ----2020\n",
" |\n",
" |----01\n",
" ...\n",
" |----12\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6W29ab9xyfbp"
},
"source": [
"## Install required dependencies\n",
"\n",
"First, let's install the required software dependencies. \n",
"\n",
"If you are using this notebook in a cloud environment, select a Python3 compatible kernel and run this next section. This only needs to be done once for the duration of this notebook.\n",
"\n",
"If you are working on local computer, you can skip this next cell. Change your kernel to *pacific-sound-notebooks*, which you installed according to the instructions in the [README](https://github.com/mbari-org/pacific-sound-notebooks/) - this has all the dependencies that are needed. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "GYJ4U9Jqyfbq",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8b3fed76-dc96-4882-9bf7-76aa4fcc506c"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\u001b[K |████████████████████████████████| 132 kB 5.0 MB/s \n",
"\u001b[K |████████████████████████████████| 79 kB 6.6 MB/s \n",
"\u001b[K |████████████████████████████████| 9.2 MB 54.7 MB/s \n",
"\u001b[K |████████████████████████████████| 140 kB 55.2 MB/s \n",
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"requests 2.23.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you have urllib3 1.26.12 which is incompatible.\u001b[0m\n",
"\u001b[?25h"
]
}
],
"source": [
"!pip install -q boto3 --quiet\n",
"!pip install -q soundfile --quiet\n",
"!pip install -q scipy --quiet\n",
"!pip install -q numpy --quiet\n",
"!pip install -q matplotlib --quiet"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "h4cKkw_Fyfbq"
},
"source": [
"### Import all packages"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "_wf2zkfxyfbq"
},
"outputs": [],
"source": [
"import boto3\n",
"from botocore import UNSIGNED\n",
"from botocore.client import Config\n",
"from six.moves.urllib.request import urlopen\n",
"import io\n",
"import scipy\n",
"from scipy import signal\n",
"import numpy as np\n",
"import soundfile as sf\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YLbX8Iyzyfbr"
},
"source": [
"## List the contents of a monthly directory"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "0kAeZMYQyfbr"
},
"outputs": [],
"source": [
"s3 = boto3.client('s3',\n",
" aws_access_key_id='',\n",
" aws_secret_access_key='', \n",
" config=Config(signature_version=UNSIGNED))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "qxupBZTJyfbr",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f5a1efc7-0d58-49b5-d6fa-9480b4274b3f"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"2020/01/MARS-20200101T000000Z-2kHz.wav\n",
"2020/01/MARS-20200102T000000Z-2kHz.wav\n",
"2020/01/MARS-20200103T000000Z-2kHz.wav\n",
"2020/01/MARS-20200104T000000Z-2kHz.wav\n",
"2020/01/MARS-20200105T000000Z-2kHz.wav\n",
"2020/01/MARS-20200106T000000Z-2kHz.wav\n",
"2020/01/MARS-20200107T000000Z-2kHz.wav\n",
"2020/01/MARS-20200108T000000Z-2kHz.wav\n",
"2020/01/MARS-20200109T000000Z-2kHz.wav\n",
"2020/01/MARS-20200110T000000Z-2kHz.wav\n",
"2020/01/MARS-20200111T000000Z-2kHz.wav\n",
"2020/01/MARS-20200112T000000Z-2kHz.wav\n",
"2020/01/MARS-20200113T000000Z-2kHz.wav\n",
"2020/01/MARS-20200114T000000Z-2kHz.wav\n",
"2020/01/MARS-20200115T000000Z-2kHz.wav\n",
"2020/01/MARS-20200116T000000Z-2kHz.wav\n",
"2020/01/MARS-20200117T000000Z-2kHz.wav\n",
"2020/01/MARS-20200118T000000Z-2kHz.wav\n",
"2020/01/MARS-20200119T000000Z-2kHz.wav\n",
"2020/01/MARS-20200120T000000Z-2kHz.wav\n",
"2020/01/MARS-20200121T000000Z-2kHz.wav\n",
"2020/01/MARS-20200122T000000Z-2kHz.wav\n",
"2020/01/MARS-20200123T000000Z-2kHz.wav\n",
"2020/01/MARS-20200124T000000Z-2kHz.wav\n",
"2020/01/MARS-20200125T000000Z-2kHz.wav\n",
"2020/01/MARS-20200126T000000Z-2kHz.wav\n",
"2020/01/MARS-20200127T000000Z-2kHz.wav\n",
"2020/01/MARS-20200128T000000Z-2kHz.wav\n",
"2020/01/MARS-20200129T000000Z-2kHz.wav\n",
"2020/01/MARS-20200130T000000Z-2kHz.wav\n",
"2020/01/MARS-20200131T000000Z-2kHz.wav\n"
]
}
],
"source": [
"year = 2020\n",
"month = 1\n",
"bucket = 'pacific-sound-2khz'\n",
"\n",
"for obj in s3.list_objects_v2(Bucket=bucket, Prefix=f'{year:04d}/{month:02d}')['Contents']:\n",
" print(obj['Key'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yFo1GHnvyfbs"
},
"source": [
"## Retrieve metadata for a file"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "6XPPPRWIyfbs",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "320a592d-c640-4f62-ea63-a2ea5c096c80"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<_io.BytesIO object at 0x7f68eac86470>\n",
"samplerate: 2000 Hz\n",
"channels: 1\n",
"duration: 3.278 s\n",
"format: WAV (Microsoft) [WAV]\n",
"subtype: Signed 24 bit PCM [PCM_24]\n",
"endian: FILE\n",
"sections: 1\n",
"frames: 6556\n",
"extra_info: \"\"\"\n",
" Length : 20000\n",
" RIFF : 518400324 (should be 19992)\n",
" WAVE\n",
" fmt : 16\n",
" Format : 0x1 => WAVE_FORMAT_PCM\n",
" Channels : 1\n",
" Sample Rate : 2000\n",
" Block Align : 3\n",
" Bit Width : 24\n",
" Bytes/sec : 6000\n",
" LIST : 280\n",
" INFO\n",
" INAM : MBARI ocean audio data, start 20200101T000000 UTC\n",
" ICMT : If you use these data, please cite https://doi.org/10.1109/OCEANS.2016.7761363. Recording metadata can be found at https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment02.json.\n",
" data : 518400000 (should be 19668)\n",
" End\n",
" \"\"\""
]
},
"metadata": {},
"execution_count": 5
}
],
"source": [
"year = 2020\n",
"month = 1\n",
"filename = 'MARS-20200101T000000Z-2kHz.wav'\n",
"bucket = 'pacific-sound-2khz'\n",
"key = f'{year:04d}/{month:02d}/{filename}'\n",
"\n",
"url = f'https://{bucket}.s3.amazonaws.com/{key}'\n",
"\n",
"sf.info(io.BytesIO(urlopen(url).read(20_000)), verbose=True) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jcvRD2PMyfbs"
},
"source": [
"## Calibrated Spectrum Levels"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DXJwUdETyfbs"
},
"source": [
"### Produce calibrated spectrogram for a full day\n",
"For the low-frequency (2 kHz) data, calibration data are not frequency dependent; a single low-frequency calibration value is used. Its value depends on time of data collection, as two hydrophones have been deployed sequentially at the same site. Before 14 June 2017, the calibration value is -168.8 dB re V / uPa (measured at 26 Hz). After this date the value is -177.9 dB re V / uPa (measured at 250 Hz). See also:\n",
"\n",
"\n",
"* https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment01.json\n",
"* https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment02.json\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "yZNhI0g-yfbs",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "2dae60ac-4c07-45e1-b2da-88232d5d0cb4"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Reading from https://pacific-sound-2khz.s3.amazonaws.com/2020/01/MARS-20200101T000000Z-2kHz.wav\n",
"1440 segments of length 60 seconds in 86400.0 seconds of audio\n"
]
}
],
"source": [
"# create a 1-Hz x n second calibrated spectrogram at 1 second resolution\n",
"print(f'Reading from {url}')\n",
"x, sample_rate = sf.read(io.BytesIO(urlopen(url).read()),dtype='float32') \n",
"v = x*3 # convert scaled voltage to volts\n",
"v.shape, v.size, sample_rate\n",
"a = np.arange(v.size)+1\n",
"# define segment processing\n",
"nsec = (v.size)/sample_rate # number of seconds in vector\n",
"spa = 60 # seconds per average\n",
"nseg = int(nsec/spa)\n",
"print(f'{nseg} segments of length {spa} seconds in {nsec} seconds of audio')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "_IuJ4tyRyfbt",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "9459d35f-46cc-42fb-a559-8176d962f49d"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(1001, 1440)"
]
},
"metadata": {},
"execution_count": 7
}
],
"source": [
"# initialize empty LTSA\n",
"nfreq = int(sample_rate/2+1)\n",
"nfreq,nseg\n",
"LTSA = np.empty((nfreq, nseg), float)\n",
"LTSA.shape"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "E6milnmbyfbt"
},
"outputs": [],
"source": [
"# get window for welch\n",
"w = scipy.signal.get_window('hann',sample_rate)\n",
"\n",
"# process LTSA\n",
"for x in range(0,nseg):\n",
" cstart = x*spa*sample_rate\n",
" cend = (x+1)*spa*sample_rate\n",
" f,psd = scipy.signal.welch(v[cstart:cend],fs=sample_rate,window=w,nfft=sample_rate)\n",
" psd = 10*np.log10(psd) + 177.9\n",
" LTSA[:,x] = psd"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "Ldap8fC5yfbt",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "2588f9a0-57a5-43e9-f0b0-03204b577506"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(86400.0, 1440, (1001, 1440))"
]
},
"metadata": {},
"execution_count": 9
}
],
"source": [
"nsec, nseg, LTSA.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DaDJZCoIyfbt"
},
"source": [
"### Plot the calibrated spectrogram\n",
"Note: The sharp drop in signal approaching 1 kHz reflects the attributes of the decimation filter applied to produce the 2 kHZ data from the original 256 kHz data."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "Boxwhgo6yfbt",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "5ddf9641-d8eb-4435-f1b7-eced922f3333"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Calibrated spectrum levels')"
]
},
"metadata": {},
"execution_count": 10
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"