ABSTRACT
The accuracy of voice activity detection (VAD), which is used extensively throughout speech processing systems, can be significantly reduced in noisy environments. In this study, we are proposing an approach for enhancing the accuracy of semi-supervised Gaussian mixture model-based VAD (SS-GMM-VAD) via the integration of a speech enhancement module that incorporates the spectral subtraction with time-frequency (SS-TF) filtering technique. To begin, we will assess the SS-TF filtering method's performance on the basis of both speech quality and intelligibility metrics. Following this assessment, we will demonstrate how the SS-TF filtering module can be integrated into an existing VAD system for the purpose of pre-processing noisy speech signals. Finally, we will compare the performance of our proposed VAD system across a variety of noise conditions and signal-to-noise ratios (SNRs). Our experimental results show enhanced VAD performance and provide evidence of the ability of the SS-TF filtering method to improve VAD performance in adverse acoustic environments.
