The influence of the motor system on speech perception has been getting tons of high profile attention lately and "sensorimotor theories" of speech perception are gaining popularity. For an interesting example of the such a theory, check out Jean-Luc Schwartz et al.'s, The Perception-for-Action-Control Theory (PACT): A perceptuo-motor theory of speech perception.
It is all well-and-good to understand the contribution of motor information to speech perception, but let's not forget that there is more to the brain and speech processing than the motor system. For example, there is a long history of research on lexical effects in speech perception. The Ganong (1980) effect is one: the category boundary will shift toward the lexical item in a speech continuum like gift-kift where one end of the continuum is a word. Another example comes from the phoneme restoration effect (Warren, 1970). If a speech segment is deleted in the middle of a word you can easily hear the gap. However, if that gap is replaced by a noise, the missing segment can be heard quite clearly in some cases. This effect is enhanced by lexical information (Samuel, 1981): phonemic restoration is more robust in words than nonwords and in longer words (more lexical predictability) than shorter words.
These are interesting effects that are typically interpreted as evidence for top-down modulation of lower-level auditory perception. (Note that motor effects can be explained in exactly the same way, top-down modulation; there is no need to resurrect the Motor Theory of Speech Perception.) Over the last decade or so, there is been increasing interest in identifying the neural basis of these effects. One study by Myers & Blumstein investigated the Ganong effect and another by Shahin, Bishop, & Miller investigated phonemic restoration. I love this line of work, but I'm not sure we have nailed down the best approach yet.
Myers & Blumstein used voice onset time continua involving gift-kift and a giss-kiss. They took advantage of the fact that the category boundary for these two VOT-matched continua differ because they have lexical items at opposite ends. This allowed them to compare the BOLD response in an fMRI study to same VOT matched stimuli that was either at the boundary in one continuum or in a non-boundary position in the other continuum. They reported more activity for boundary items than non-boundary items in (i) bilateral STG, (ii) L cingulate, (iii) L precentral gyrus, (iv) L mid frontal gyrus, & (v) L precuneus. They interpreted the STG activation as evidence that the lexical information influences early perceptual processes and activations in frontal/midline regions as reflections of higher order executive processes.
This conclusion seems reasonable, but I'm not sure I buy the logic that gets us there. I suppose the logic of the particular comparisons is that for ambiguous stimuli (those at the boundary) the lexical effect will be most prominent and therefore show up in the BOLD response for boundary stimuli relative to non-boundary stimuli. But one might also reason that the strongest lexical effect should be found at certain non-boundary items, namely those that are normally at the boundary but now are not at the boundary because of the lexical pull. I.e., a stimulus that used to be ambiguous is now non-ambiguous because of all the work lexical information has done to affect perception. Another possible explanation of their findings is that boundary items are more difficult to categorize and so require more executive resources (frontal/midline activations) and these executive systems in turn modulate auditory areas, e.g., by increasing attentional gain. In short, I don't think the conclusions are necessarily wrong, but I there are some questions remaining.
Shahin et al. (2009) used a pretty slick design to assess phonemic restoration effects in physically very similar stimuli in an fMRI study. Following Samuel they presented speech that either contained a gap (with noise filling the gap) or did not contain a gap (with noise superimposed over the speech segment). Subjects were asked to decide whether the stimulus was intact or contained a gap. One complication of studying phonemic restoration is that speech that contains a gap is physically different from speech that does not, so it is unclear whether any observed effects result from the illusion or the physical gap. To get around this Shahin et al. manipulated the duration of the noise burst in the stimuli such that all stimuli were right at the threshold boundary for hearing the illusion or not. This resulted in a set of highly overlapping stimuli in terms of their physical properties but a wobbly perception. They then used information about how the stimuli were actually perceived to then probe the brain response.
The primary comparisons were
(1) items that elicited an illusion (items with gaps that were perceived as intact) minus items that were intact and perceived as intact -- so both stimuli were perceived as intact but one was illusory. This contrast was assumed to identify areas involved in phonemic repair.
(2) items that elicited an illusion minus items that failed to elicit an illusion (items with gaps that were perceived as items with gaps). This contrast was assumed to identify areas that correlated with the actually perception of the illusion.
Comparison #1 resulted in activation in Broca's area (~BA44), the anterior insula bilaterally, and the left pre-SMA.
Comparison #2 resulted in activation in left angular gyrus/STS, right STS, precuneus, and bilateral superior frontal sulcus.
Both word and nonword stimuli were used and these effects were evaluated in ROI analyses. The left AG/STS showed an interaction between lexical status and perceptual condition, which the authors suggest is reflective of the use of a lexical template for filling in missing information. Broca's area and insulae also showed an interaction and further seemed to respond most robustly to illusion-failure trials within the word condition (reflecting extra work trying, but failing to repair?).
So unlike the Myers & Blumstein study, Shahin et al. do not report extensive activity in the bilateral STG (the STS activity is very posterior) but instead find "repair" activity in frontal areas and lexical effects ("template matching") in posterior STS/AG.
One complication with the AG/STS activations is that these are all sub-baseline effects (signal intensity < 0), so the differences are degrees of deactivation. One could appeal to the "default network" in explaining these patterns, but the authors argue against such an account. At the very least, the negative activation complicates the picture.
The real question here is what do these subtractions reveal. In principle, I like the idea of correlating responses with perceptual experience. But at the same time, conscious perceptual experience is a fairly high-level phenomenon, whereas many of the processes we are interested in, those down in the trenches of the processing stream that ultimately lead to perception, may be unconscious and may share computations between stimuli that are ultimately perceived one way versus another. So in the end, it is hard to know what is actually being detected and more importantly, what is not being detected.
In general, I think this is an important line of investigation and I'd like to see folks give it more attention. Who knows, it might even lead to a competitor to the sensorimotor models of speech perception: the sensory-lexical model of speech perception.
Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance, 6, 110-125.
Shahin, A., Bishop, C., & Miller, L. (2009). Neural mechanisms for illusory filling-in of degraded speech NeuroImage, 44 (3), 1133-1143 DOI: 10.1016/j.neuroimage.2008.09.045
Myers EB, & Blumstein SE (2008). The neural bases of the lexical effect: an fMRI investigation. Cerebral cortex (New York, N.Y. : 1991), 18 (2), 278-88 PMID: 17504782
Samuel, A.G. (1981). Phonemic restoration: Insights from a new methodology. JEP: General, 110, 474-94.
Warren, R.M. Perceptual restoration of missing speech sounds. Science, 1970, 167, 392-393