\n",
" \n",
" * Distributed under the terms of the GPL License\n",
" * Maintainer: dcline@mbari.org\n",
" * Authors: Danelle Cline dcline@mbari.org, John Ryan ryjo@mbari.org"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Exploration of the 256 kHz Pacific Ocean Audio Data in the AWS Open Data Registry\n",
"\n",
"---\n",
"An extensive (5+ years and growing) archive of sound recordings from a deep-sea location [along the eastern margin of the North Pacific Ocean](https://www.mbari.org/at-sea/cabled-observatory/) has been made available through AWS Open data. Temporal coverage of the recording archive has been 95% since project inception in July 2015. The original recordings have a sample rate of 256 kHz. This notebook illustrates basic methods to access and process the original audio data using Python.\n",
"\n",
"If you use this data set, please **[cite our project](https://ieeexplore.ieee.org/document/7761363).**\n",
"\n",
"
\n",
"
A delayed version of this data can be heard here on this live audio station
\n",
" \n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Overview\n",
"The full-resolution audio data are in [WAV](https://en.wikipedia.org/wiki/WAV) format in s3 buckets named pacific-sound-256khz-yyyy, where yyyy is 2015 or later. Buckets are stored as objects, so the data aren't physically stored in folders or directories as you may be famaliar with, but you can think of it conceptually as follows:\n",
"\n",
"```\n",
"pacific-sound-256khz-2021\n",
" |\n",
" individual 10-minute files\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install dependencies\n",
"\n",
"First, let's install the required software dependencies. If you are working on local computer, you can skip this next cell. Change your kernel to *pacific-sound-notebooks*, which you installed according to the instructions in the [README](https://github.com/mbari-org/pacific-sound-notebooks/) - this has all the dependencies that are needed. \n",
"\n",
"Otherwise, if you are using this notebook in a cloud jupyter notebook, select a Python3 compatible kernel, remove the comment # before each line and run the code cell. This only needs to be done once for the duration of this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# !pip install -q boto3\n",
"# !pip install -q soundfile\n",
"# !pip install -q scipy\n",
"# !pip install -q numpy\n",
"# !pip install -q matplotlib"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import all packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"from botocore import UNSIGNED\n",
"from botocore.client import Config\n",
"from six.moves.urllib.request import urlopen\n",
"import io\n",
"import scipy\n",
"from scipy import signal, interpolate\n",
"import numpy as np\n",
"import soundfile as sf\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## List the contents of a monthly directory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s3 = boto3.client('s3',\n",
" aws_access_key_id='',\n",
" aws_secret_access_key='', \n",
" config=Config(signature_version=UNSIGNED))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bucket = 'pacific-sound-256khz-2018'\n",
"\n",
"for i, obj in enumerate(s3.list_objects_v2(Bucket=bucket)['Contents']):\n",
" print(obj['Key'])\n",
" if i > 20:\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read metadata from a file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bucket = 'pacific-sound-256khz-2018'\n",
"filename = '01/MARS_20180101_092406.wav'\n",
"url = f'https://{bucket}.s3.amazonaws.com/{filename}'\n",
"print(f'Reading metadata from {url}')\n",
"sf.info(io.BytesIO(urlopen(url).read()), verbose=True) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read data from a file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f'Reading data from {url}')\n",
"x, sample_rate = sf.read(io.BytesIO(urlopen(url).read()),dtype='float32')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calibrated Spectrum Levels"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calibration metadata\n",
"Frequency-dependent hydrophone sensitivity data are defined in the following files, one for each deployment:\n",
"* https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment01.json\n",
"* https://bitbucket.org/mbari/pacific-sound/src/master/MBARI_MARS_Hydrophone_Deployment02.json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compute spectrogram\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# convert scaled voltage to volts\n",
"v = x*3 \n",
"nsec = (v.size)/sample_rate # number of seconds in vector\n",
"spa = 1 # seconds per average\n",
"nseg = int(nsec/spa)\n",
"print(f'{nseg} segments of length {spa} seconds in {nsec} seconds of audio')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# initialize empty LTSA\n",
"nfreq = int(sample_rate/2+1)\n",
"nfreq,nseg\n",
"sg = np.empty((nfreq, nseg), float)\n",
"sg.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get window for welch\n",
"w = scipy.signal.get_window('hann',sample_rate)\n",
"\n",
"# process spectrogram\n",
"for x in range(0,nseg):\n",
" cstart = x*spa*sample_rate\n",
" cend = (x+1)*spa*sample_rate\n",
" f,psd = scipy.signal.welch(v[cstart:cend],fs=sample_rate,window=w,nfft=sample_rate)\n",
" psd = 10*np.log10(psd)\n",
" sg[:,x] = psd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Apply calibration\n",
"Frequency-dependent hydrophone sensitivity data are reported in the json files identified above. This example file is from the second hydrophone deployment, for which the calibration data are manually entered below. Note that the lowest measured value, at 250 Hz, is assumed to cover lower frequencies and repeated as a value at 0 Hz to allow interpolation to the spectrogram output frequencies across the full frequency range. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# define hydrophone calibration data\n",
"calfreq = [0,250,10000,20100,30100,40200,50200,60200,70300,80300,90400,100400,110400,120500,130500,140500,150600,160600,170700,180700,190700,200000]\n",
"calsens = [-177.90,-177.90,-176.80,-176.35,-177.05,-177.35,-177.30,-178.05,-178.00,-178.40,-178.85,-180.25,-180.50,-179.90,-180.15,-180.20,-180.75,-180.90,-181.45,-181.30,-180.75,-180.30]\n",
"\n",
"# interpolate to the frequency resolution of the spectrogram\n",
"tck = interpolate.splrep(calfreq, calsens, s=0)\n",
"isens = interpolate.splev(f, tck, der=0)\n",
"plt.figure(dpi=300)\n",
"im = plt.plot(calfreq,calsens,'bo',f,isens,'g') \n",
"plt.xlabel('Frequency (Hz)')\n",
"plt.ylabel('Hydrophone sensitivity (dB re V/uPA)')\n",
"plt.legend(['Factory, measured', 'Interpolated'])\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# replicate interpolated sensitivity\n",
"isensg = np.transpose(np.tile(isens,[nseg,1]))\n",
"isensg.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sg.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plot the calibrated spectrogram"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(dpi=300)\n",
"im = plt.imshow(sg-isensg,aspect='auto',origin='lower',vmin=30,vmax=100)\n",
"plt.yscale('log')\n",
"plt.ylim(10,100000)\n",
"plt.colorbar(im)\n",
"plt.xlabel('Seconds')\n",
"plt.ylabel('Frequency (Hz)')\n",
"plt.title('Calibrated spectrum levels')"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"machine_shape": "hm",
"name": "PacificSound_256kHz.ipynb",
"private_outputs": true,
"provenance": []
},
"kernelspec": {
"display_name": "pacific-sound-notebooks",
"language": "python",
"name": "pacific-sound-notebooks"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}