Audio Features for Automatic Speech Recognition andAudio Anal- ysis |

ABSTRACT

CONTENTS 9.1 Speech Features: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 9.2 Tools for Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 9.3 Cepstrum Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 9.4 LPC Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 9.5 Feature Extraction for ASR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 9.6 Perceptually-Based Features: PLP and MFCC Features . . . . . . . . . 225 9.7 Practical Implementations of PLP and MFCC . . . . . . . . . . . . . . . . . . . 231 9.8 Generic Audio Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 9.9 ASR Features: Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 9.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

9.1 Speech Features: Introduction The input to any speech recognition system is the raw sampled recorded audio data. It is not practical or desirable to try to implement a recognition system using the raw input sampled audio data due to an extremely large amount of data redundancy in typical speech signals and a lack of any form of speaker/environmental invariance in the data.