ABSTRACT

In the Indian scenario, the inadequacy of digitally available resources of language restricts the expansion of speech technology applications. This paper describes an experimental study of two such low-resource tribal languages (LRTL) of India, Santali, and Hrangkhawl for language identification (LID) purposes. Two different approaches have been taken for the present analysis of 10 hours of speech data. In the first approach, the measure of performance of the LID using the outcome of the acoustic analysis of these two LRTL has been used. On the second approach, we have used a 39-dimensional feature vector and used Vector Quantization (VQ), Gaussian mixture model (GMM), Support Vector Machine (SVM), Multilayer Perceptron (MLP) as classifiers. On collected speech data, we have compared these two proposed approaches. We observed after analysis that the second approach outperforms the first approach and received encouraging results for researchers of LRTL. 222This experimental study also shows important characteristics of the acoustic features of both languages.