ABSTRACT

This chapter aims to incorporate the wavelet transform and auditory nerve-based models into a tool that could be used for speaker identification, in the hope that the results would be more robust to noise than the standard methods. It utilizes the continuous wavelet transform to extract reliably the different components of the modulation model and the parameters characterizing them. The chapter shows that results of a first test of the use of the synchro-squeezed representation for speaker identification. It also shows that some results: the “untreated” wavelet transform of a speech segment, its squeezed and synchrosqueezed versions, and the extraction of the parameters used for speaker identification. The whole construction is based on a continuous wavelet transform. In practice, this is of course a discrete but very redundant transform, heavily oversampled both in time and in scale. The chapter concludes with some pointers to and comparisons with similar work in the literature, and with sketching possible future directions.