ABSTRACT

The block diagram of the proposed system is shown in Figure 1. The experiment is conducted in four phases. In the first phase, a baseline system with MFCC features are used, followed by phases with frame slope, modified group delay feature and early fusion of these features. In all the phases, GMM-based classifiers are used. Sixty-four mixture GMMs are trained for all instrument models, using audio files in an isolated environment. For each instrument model, the likelihood score is computed for the test audio file and the model which reports maximum log-likelihood is declared as the decision. It can be formulated mathematically as finding the target – λi for which the following criteria is satisfied.