Classification¶
VQ Based Classification¶
Along with the probabilistic methods (see below), and as a means to further evaluate the clustering/VQ mechanisms, we also include a classification method based on the quantization distortion measure.
-
VQ Learning: A codebook
is generated for each class using the designated training instances for that class. -
VQ Classification: Classification of a particular unit instance
is given by the trained codebook that gets the minimum average distortion in the quantization of :where
is defined as the average quantization distortion over the vectors in when using .
Probabilistic methods
In each of the methods below, by using the training data in a supervised manner,
we generate a set of models
Note: In these methods, the underlying codebooks for quantization are those generated with all training song unit instances regardless of class.
Naive Bayes¶
The Naive Bayes (NB) classifier is a simple method shown to be quite effective in many practical applications [5]. Its core underlying assumption is that the observed symbols for a class are independent from each other.
A Naive Bayes model is defined by:
: Number of attributes (observable symbols) : Observation probability distribution
Example of a model with 3 observable symbols
-
NB Learning: A model for class
is generated with the designated training observation sequences using an -estimate (pseudocounts) to determine the distribution [5]. Each element of the distribution is determined as:where
is the number of quantized symbols equal to and is the total number of symbols across all training sequence from class . -
NB Classification: The probability of a sequence
given model is:With this, as indicated above, we use a maximum likelihood criterium for classification:
Markov Chain¶
A first-order Markov chain model (MM) allows us to start incorporating
a time dependency in the observed sequences [6].
In this model, for a given sequence, each observed symbol corresponds
to a state of the system, and the probability of
observing a particular symbol at time
A Markov Chain model is defined by:
: Number of states (observable symbols in our case) : Initial state probability distribution : State probability distributions
Example of a model with 2 observable symbols
-
MM Learning: Basically done as with NB learning, here we also use pseudocounts to determine the
and distributions based on the given training sequences. -
MM Classification: With
denoting a Markov chain model, the probability of a sequence given is:which we use for our maximum likelihoodclassification:
Hidden Markov Modeling¶
In HMM, states no longer represent the observable elements of the underlying stochastic process as in the Markov chain case. They are now hidden random variables on which output symbols are defined as probabilistic functions [7-9]. In this study we use discrete observation distributions.
An HMM model is defined by:
, : Number of states and symbols : Initial state probability distribution : State probability distributions : Observation symbol distributions
Example of a model with
-
HMM Learning: Implemented via the Baum-Welch algorithm [7].
-
HMM Classification: With
denoting an HMM, the key operation is to compute , the probablity of a sequence given . For this we use the forward-backward algorithm [7]. Once again, we use a maximum likelihood decision rule:
-
Arrow notation used here and in the following: dashed lines originating from a node represent probabilities defining a particular distribution; and filled lines represent the ocurrence of a particular event according to the distribution indicated in the label. ↩