Various Audio Classification Models for Automatic Speaker Verification System in Industry 4.0

doi:10.1201/9781003321149-8

Chapter

Various Audio Classification Models for Automatic Speaker Verification System in Industry 4.0

ABSTRACT

In recent times, speech processing is widely used in industries to automate the industry production tasks. Moreover, the popularity of automatic speaker verification (ASV) system is increasing among various biometric devices in industries. Today robots are used in industries to perform most of the crucial tasks. ASV system can be used to check if the commands given to robots are coming from a valid user or spoofed user. Spoofing is the security attack in which a malicious attacker uses the cloned audio of original user to perform malicious activities. ASV system consists of two major components. One is the front end that contains feature extraction from the input audio samples, and the other one is back-end classification model that performs the classification of audio sample into cloned or original human audio. For feature extraction, mel frequency cepstral coefficients (MFCCs), constant Q cepstral coefficients (CQCCs), Gammatone cepstral coefficients (GTCCs), etc. are used. Various classification models such as random forest (RF), Naïve Bayes (NB), long short-term memory (LSTM), convolutional neural network (CNN) are used to classify the speech samples. This chapter discusses the steps that need to perform to select the best hyper parameters for the classification model. This work also discusses different classification models that have been used for classification in speaker verification task. This chapter also explains the need of validation procedure and types of validation process in machine learning. This chapter also put emphasis on the use of different evaluation criteria to measure the performance of classification model.