ABSTRACT

This chapter deals with small segment analysis of speech and pitch synchronous analysis of speech. It discusses transform domain speech processing, namely, processing using short-time Fourier transforms, wavelet transforms (WT), and so on. Speech recognition or speaker verification systems extract certain features from speech segment, for example, pitch frequency, formants, linear predictor coefficient, and mel frequency cepstral coefficients. Discrete cosine transform can be used to change the sampling rate of the speech signal. According to the time-varying signal, a speech signal is divided into segments of fixed size, say, 128. The length of the segment is decided according to the resolution required. The WT provides the time–frequency representation of the signal. WT is a new mathematical tool used for the local representation of non-stationary signals. Wavelet packets are required to find the parameters of the signal hidden in some specific bands.