Skip to content

The pbp-json-gen command-line program is used to generate JSON files with audio metadata. This is a necessary step before running the main HMB generation program to extract and optionally correct the time data.

Instructions below assume you have already installed the package, e.g. pip install pbp. Once this is done, you can proceed to the main program pbp.

Overview

Three types of audio recorders are supported: NRS, IcListen, and Soundtrap files. Here is the current supported matrix:


Recorder Google Storage AWS S3 Local Storage
NRS
IcListen
Soundtrap

For audio that is stored in a cloud storage bucket, the URI that is required to access the audio files depends on the cloud storage provider. The data must be stored in a public cloud storage bucket; private buckets are not supported.

  • For Google Storage, use the gs: prefix, e.g. gs://noaa-passive-bioacoustic/nrs/audio/11/nrs_11_2019-2021/audio.
  • For AWS S3, use the s3: prefix, e.g. s3://pacific-sound-256khz.
  • For local files, the URI is the path to the directory where the audio files are stored with the file: prefix, e.g. file:///Volumes/PAM_Archive/FK01.

Note the triple slash after the prefix for a local archive file:///Volumes. This is required for the URI to be parsed correctly.

Examples

Note

The prefix for any file, is the string that is used to match the beginning of the file name before the timestamp. For example, if the file name is ONMS_FK01_7412_20230315_000000.wav, the prefix would be ONMS_FK01_7412_, NRS11_20191024_022220.flac would have a prefix of NRS11_, and MARS_20220902_000000.wav would have a prefix of MARS_.

Generate JSONs with audio metadata from NRS flac files for a date range

The following command generates JSON files in the json/nrs directory only for files in gs://noaa-passive-bioacoustic/nrs/audio/11/nrs_11_2019-2021/audio that iclude the file string NRS11. Logs will be stored in the output directory, for the specified date range.

pbp-json-gen --recorder=NRS \
             --json-base-dir=json/nrs \
             --output-dir=output \
             --uri=gs://noaa-passive-bioacoustic/nrs/audio/11/nrs_11_2019-2021/audio \
             --start=20191023 \
             --end=20191024 \
             --prefix=NRS11

Following this command, you should see two JSON files in the json/nrs directory; one for each day of the date range.

json/nrs/
└── 2019
    ├── 20191023.json
    └── 20191024.json

output/
├── NRS20191023_20191024.log

Generate JSONs with audio metadata from IcListen wav files for a date range

The following command generates JSON files in the json/iclisten directory only for files in s3://pacific-sound-256khz that include the file string MARS. Logs will be stored in the output directory, for the specified date range. The MARS data is recorded in 10-minute intervals, so there are many files to process.
This would be a good time to go get a cup of coffee ☕

pbp-json-gen --recorder=ICLISTEN \
             --json-base-dir=json/iclisten \
             --output-dir=output \
             --uri=s3://pacific-sound-256khz \
             --start=20191023 \
             --end=20191024 \
             --prefix=MARS

You should see two JSON files in the json/iclisten directory; one for each day of the date range.

json/iclisten/
└── 2019
    ├── 20191023.json
    └── 20191024.json

output/
├── ICLISTEN20191023_20191024.log

Generate JSONs with audio metadata from Soundtrap wav files for a date range

pbp-json-gen --recorder=SOUNDTRAP \
            --json-base-dir=json/FK01 \
            --output-dir=logs/json/FK01 \
            --uri=file://Volumes/PAM_Archive/FK01 \
            --start=20230315 \
            --end=20230316 \
            --prefix=ONMS_FK01_7412

JSON format

Why JSON?

We choose JSON files to store the metadata because it is human-readable, easy to parse, and can be easily integrated as part of a larger data processing pipeline.

The JSON file schema is as follows:


Field Description
channels The number of channels in the audio file.
uri The location of the audio file. This is a URI that can be used to access the file in a public cloud storage bucket or local file system.
start The start time of the audio file in ISO 8601 format.
end The end time of the audio file in ISO 8601 format.
duration_secs The duration of the audio file in seconds.
[
    {
        "uri": "gs://noaa-passive-bioacoustic/nrs/audio/11/nrs_11_2019-2021/audio/NRS11_20191023_222213.flac",
        "start": "2019-10-23T22:22:13Z",
        "end": "2019-10-24T02:22:13Z",
        "duration_secs": 14400,
        "channels": 1
    }
]

Need help? Try the --help option

$ pbp-json-gen --help
usage: pbp-json-gen [-h] [--version] --recorder {NRS,ICLISTEN,SOUNDTRAP} --json-base-dir dir --output-dir dir --uri uri --start YYYYMMDD --end YYYYMMDD --prefix PREFIX [PREFIX ...]

Generate JSONs with audio metadata for NRS flac files, IcListen wav files, and Soundtrap wav files from either a local directory or gs/s3 bucket.

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --recorder {NRS,ICLISTEN,SOUNDTRAP}
                        Choose the audio instrument type
  --json-base-dir dir   JSON base directory to store the metadata
  --output-dir dir      Output directory to store logs
  --uri uri             Location of the audio files. S3 location supported for IcListen or Soundtrap, and GS supported for NRS.
  --start YYYYMMDD      The starting date to be processed.
  --end YYYYMMDD        The ending date to be processed.
  --prefix PREFIX [PREFIX ...]
                        Prefix for search to match the audio files. Assumption is the prefix is separated by an underscore, e.g. 'MARS_'.

Examples:
    pbp-json-gen \
                 --json-base-dir=tests/json/nrs \
                 --output-dir=output \
                 --uri=s3://pacific-sound-ch01 \
                 --start=20220902 \
                 --end=20220902 \
                 --prefix=MARS \
                 --recorder=NRS