ABSTRACT

When we listen to the radio we can easily distinguish the music from the talking. But could a machine? Music and speech are both structured audio, and our telling them apart stems from interpreting the audio signals at some stage of representation. If we could fully describe the interpreted representation to some other human, we could possibly describe it to a machine. Conversely, if we could instruct a machine to tell music and speech apart, and did so using only those elementary processes we believe operate in humans, we would be close to accounting for how humans apparently do it.