ABSTRACT

To comprehend a spoken utterance, listeners must map a dynamic, variable, spectrotemporally complex continuous acoustic signal onto discrete linguistic representations in the brain, assemble these so as to recognize individual words, access the meanings of these words, and combine them to compute the overall meaning (Davis & Johnsrude, 2007). Words or their elements do not correspond to any invariant acoustic units in the speech signal: the speech stream does not usually contain silent gaps to demarcate word boundaries, and dramatic changes to the pronunciation of words in different contexts arise due to variation both between and within talkers (e.g., coarticulation). Despite the continuous nature and variability of speech, native speakers of a language perceive a sequence of discrete, meaningful units. How does this happen? What are the linguistic representations in the brain, and how is the mapping between a continuous auditory signal and such representations achieved? Given that speaking is a sensorimotor skill, is speech perceived in terms of its motor or auditory features? Does processing occur on multiple linguistic levels simultaneously (e.g., phonemes, syllables, words), or is there a single canonical level of representation, with larger units (like words) being assembled from these elemental units? How is acoustic variability – among talkers, and within talkers across utterances, dealt with, such that acoustically different signals all contact the same representation? (In other words, how do you perceive that Brad and Ingrid both said “I’d love lunch!” despite marked variability in the acoustics of their productions?)

These questions are fundamental to an understanding of the human use of language and have intrigued psychologists, linguists, and others for at least 50 years. Recent advances in methods for stimulating and recording activity in the human brain permit these perennial questions to be addressed in new ways. Over

the last 20 years, cognitive-neuroscience methods have yielded a wealth of data related to the organization of speech and language in the brain. The most important methods include functional magnetic resonance imaging (fMRI), which is a non-invasive method used to study brain activity in local regions and functional interactions among regions. Pattern-information analytic approaches to fMRI data, such as multi-voxel pattern analysis (Mur, Bandettini, & Kriegeskorte, 2009), permit researchers to examine the information that is represented in different brain regions. Another method is transcranial magnetic stimulation (TMS), which is used to stimulate small regions on the surface of the brain, thereby reducing neural fi ring thresholds or interrupting function.