The pbp-json-gen
command-line program is used to generate JSON files with audio metadata. This is a necessary step
before running the main HMB generation program to extract and optionally correct the time data.
Instructions below assume you have already installed the package,
e.g. pip install pbp
. Once this is done, you can proceed to the main program pbp.
Overview¶
Three types of audio recorders are supported: NRS, IcListen, and Soundtrap files. Here is the current supported matrix:
Recorder | Google Storage | AWS S3 | Local Storage |
---|---|---|---|
NRS | |||
IcListen | |||
Soundtrap |
For audio that is stored in a cloud storage bucket, the URI that is required to access the audio files depends on the cloud storage provider. The data must be stored in a public cloud storage bucket; private buckets are not supported.
- For Google Storage, use the gs: prefix, e.g.
gs://noaa-passive-bioacoustic/nrs/audio/11/nrs_11_2019-2021/audio
. - For AWS S3, use the s3: prefix, e.g.
s3://pacific-sound-256khz
. - For local files, the URI is the path to the directory where the audio files are stored with the file: prefix, e.g.
file:///Volumes/PAM_Archive/FK01
.
Note the triple slash after the prefix for a local archive file:///Volumes. This is required for the URI to be parsed correctly.
Examples¶
Note
The prefix for any file, is the string that is used to match the beginning of the file name before the timestamp. For example, if the file name is ONMS_FK01_7412_20230315_000000.wav
,
the prefix would be ONMS_FK01_7412_
, NRS11_20191024_022220.flac
would have a prefix of NRS11_
, and MARS_20220902_000000.wav
would have a prefix of MARS_
.
Generate JSONs with audio metadata from NRS flac files for a date range¶
The following command generates JSON files in the json/nrs
directory only for files in gs://noaa-passive-bioacoustic/nrs/audio/11/nrs_11_2019-2021/audio
that iclude the file string NRS11. Logs will be stored in the output
directory, for the specified date range.
pbp-json-gen --recorder=NRS \
--json-base-dir=json/nrs \
--output-dir=output \
--uri=gs://noaa-passive-bioacoustic/nrs/audio/11/nrs_11_2019-2021/audio \
--start=20191023 \
--end=20191024 \
--prefix=NRS11
Following this command, you should see two JSON files in the json/nrs
directory; one for each day of the date range.
json/nrs/
└── 2019
├── 20191023.json
└── 20191024.json
output/
├── NRS20191023_20191024.log
Generate JSONs with audio metadata from IcListen wav files for a date range¶
The following command generates JSON files in the json/iclisten
directory only for files in s3://pacific-sound-256khz
that include the file string MARS.
Logs will be stored in the output
directory, for the specified date range. The MARS data is recorded in 10-minute intervals, so there are many files to process.
This would be a good time to go get a cup of coffee
pbp-json-gen --recorder=ICLISTEN \
--json-base-dir=json/iclisten \
--output-dir=output \
--uri=s3://pacific-sound-256khz \
--start=20191023 \
--end=20191024 \
--prefix=MARS
You should see two JSON files in the json/iclisten
directory; one for each day of the date range.
json/iclisten/
└── 2019
├── 20191023.json
└── 20191024.json
output/
├── ICLISTEN20191023_20191024.log
Generate JSONs with audio metadata from Soundtrap wav files for a date range¶
pbp-json-gen --recorder=SOUNDTRAP \
--json-base-dir=json/FK01 \
--output-dir=logs/json/FK01 \
--uri=file://Volumes/PAM_Archive/FK01 \
--start=20230315 \
--end=20230316 \
--prefix=ONMS_FK01_7412
JSON format¶
Why JSON?
We choose JSON files to store the metadata because it is human-readable, easy to parse, and can be easily integrated as part of a larger data processing pipeline.
The JSON file schema is as follows:
Field | Description |
---|---|
channels | The number of channels in the audio file. |
uri | The location of the audio file. This is a URI that can be used to access the file in a public cloud storage bucket or local file system. |
start | The start time of the audio file in ISO 8601 format. |
end | The end time of the audio file in ISO 8601 format. |
duration_secs | The duration of the audio file in seconds. |
[
{
"uri": "gs://noaa-passive-bioacoustic/nrs/audio/11/nrs_11_2019-2021/audio/NRS11_20191023_222213.flac",
"start": "2019-10-23T22:22:13Z",
"end": "2019-10-24T02:22:13Z",
"duration_secs": 14400,
"channels": 1
}
]
Need help? Try the --help option¶
$ pbp-json-gen --help
usage: pbp-json-gen [-h] [--version] --recorder {NRS,ICLISTEN,SOUNDTRAP} --json-base-dir dir --output-dir dir --uri uri --start YYYYMMDD --end YYYYMMDD --prefix PREFIX [PREFIX ...]
Generate JSONs with audio metadata for NRS flac files, IcListen wav files, and Soundtrap wav files from either a local directory or gs/s3 bucket.
options:
-h, --help show this help message and exit
--version show program's version number and exit
--recorder {NRS,ICLISTEN,SOUNDTRAP}
Choose the audio instrument type
--json-base-dir dir JSON base directory to store the metadata
--output-dir dir Output directory to store logs
--uri uri Location of the audio files. S3 location supported for IcListen or Soundtrap, and GS supported for NRS.
--start YYYYMMDD The starting date to be processed.
--end YYYYMMDD The ending date to be processed.
--prefix PREFIX [PREFIX ...]
Prefix for search to match the audio files. Assumption is the prefix is separated by an underscore, e.g. 'MARS_'.
Examples:
pbp-json-gen \
--json-base-dir=tests/json/nrs \
--output-dir=output \
--uri=s3://pacific-sound-ch01 \
--start=20220902 \
--end=20220902 \
--prefix=MARS \
--recorder=NRS