ABSTRACT

The strategies used in human auditory scene analysis (ASA) have implications for the architecture of a computational ASA system. To model the stability of the human ASA, the computational system must allow different cues to collaborate and compete, and must account for the propagation of constraints across the frequency-by-time field. A scaleable architecture must be designed in which the number of cues can easily be extended and the difficulty of the problems increased without requiring a change in the basic design. Unless modelers are familiar with a wide range of psychological data, they may develop architectures that are successful only on "toy" problems, or that model only a few phenomena, and do not lend themselves to being expanded to deal with a wider class. Finally, if speech recognition systems are to exploit the pre-processing done by ASA, they must be modified so they can use the data linkages that have been proposed by ASA.