Abstract¶

Song is a prevalent behavior in humpback whale populations, even in regions that are considered to be foraging habitat such as the Monterey Bay National Marine Sanctuary in the Northeast Pacific, where song is detected nine months out of the year. In this work we explore various machine learning methods to classify song units, as a basis for studying song structure and its changes. We report on a number of analyses and classification exercises based on linear predictive coding, vector quantization, and machine learning classifiers including Naive Bayes, first-order Markov chain, and Hidden Markov modeling. As a baseline for comparison purposes, the distortion measure used to create the codebooks for vector quantization is itself also used as a means for classification. With classification accuracy ranging from 88% to 94% across the selected methods on a 4.5 hour recording involving 4539 unit occurrences over 8 different unit types, we evaluate the effect of several signal processing, clustering, and learning parameters on classification performance with the goal of laying a foundation that can be used to characterize song vocalization at not only the unit level but also below (subunit) and above (phrase, theme, song).