ABSTRACT

Speech-enhancement techniques can be employed as an aid for a voice activity detector (VAD) that has been affected by the presence of background noise in a number of different applications of speech-processing. The most common form of VADs are those based on the zero-crossing rate (ZCR), and short-term energy (STE). In noiseless scenarios these two VADs perform well; however, they both suffer greatly in noisy backgrounds. A possible solution to the poor performance in the noisy environment is to employ preprocessing with speech-enhancement techniques prior to the VAD. This paper will evaluate whether preprocessing of speech data from a variety of sources (noise-types and noise-levels) improves the accuracy of a ZCR and STE-based VAD when employed together, and if so, how significantly. Experimental results obtained using the NOIZEUS dataset demonstrate that the combined application of preprocessing and speech enhancement techniques result in improved VAD-accuracy, and significantly more accurate at low signal-to-noise ratios (SNRs), which demonstrates the feasibility of the proposed approach.