It has been suggested that auditory cortex is hierarchically organized with the highest levels of this hierarchy, for speech processing anyway, located in left anterior temporal cortex (Rauschecker & Scott, 2009; Scott et al., 2000). Evidence for this view comes from PET and fMRI studies which contrast intelligible speech with unintelligible speech and find a prominent focus of activity in the left anterior temporal lobe (Scott et al., 2000). Intelligible speech (typically sentences) has included clear speech and noise vocoded variants which are acoustically different but both intelligible, whereas unintelligible speech has included spectrally rotated versions of these stimuli. The idea is that regions that respond to the intelligible conditions are exhibiting acoustic invariance, i.e., responding to the higher-order categorical information (phonemes, words) and therefore reflect high levels in the auditory hierarchy.
However, the anterior focus of activation contradicts lesion evidence which shows that damage to posterior temporal lobe regions is most predictive of auditory comprehension deficits in aphasia. Consequently, we have argued that the anterior temporal lobe activity in these studies is more a reflection of the fact that subjects are comprehending sentences -- which are known to activate anterior temporal regions more than words alone -- than intelligibility of speech sounds and/or words (Hickok & Poeppel, 2004, 2007). Therefore, our claim has been that the top of the auditory hierarchy for speech (regions involved in phonemic level processes) is more posterior.
To assess this hypothesis we fully replicated previous intelligibility studies using two intelligible conditions, clear sentences and noise vocoded sentences, and two unintelligible conditions, rotated versions of these. But instead of using standard univariate methods to examine the neural response, we used multivariate pattern analysis (MVPA) to assess regional sensitivity to acoustic variation within and across intelligibility manipulations.
We did perform the usual general linear model subtractions: intelligible [(clear + noise vocoded) - (rotated + rotated noise vocoded)] and found robust activity in the left anterior superior temporal sulcus (STS), but also in the left posterior STS, and right anterior and posterior STS. This finding shows that intelligible speech activity is not restricted to anterior areas, or even the left hemisphere. A broader bilateral network is involved.
Next we examined the pattern of response in various activated regions using MVPA. MVPA looks at the pattern of activity within a region rather than the pooled amplitude of the region as a whole. If different patterns of activity can be reliably demonstrated in a region, this is an indication that the manipulated features (e.g., acoustic variation in our case) are being coded or processed differently within the region.
The first thing we looked at was whether the pattern of activity in and immediately surrounding Heschl's gyrus was sensitive to intelligibility and/or acoustic variation. This is actually an important prerequisite for claiming acoustic invariance, and therefore higher-order processing, in downstream auditory areas: If you want to claim that invariance to acoustic features downstream reflects higher levels of processing in the cortical hierarchy, you need to show that earlier auditory areas are sensitive to these same acoustic features. So we defined early auditory cortex independently using a localizer scan, AM noise modulated at 8Hz relative to scanner noise. The figure below shows the location of this ROI (roughly that is, as this is a group image and for all MVPA analyses ROIs are defined in individual subjects) and the average BOLD amplitude to the various speech conditions. Notice that we see similar levels of activity for all conditions, especially clear speech and rotated speech which appear to yield identical responses in Heschl's gyrus. This seems to provide evidence that rotated speech is indeed a good acoustic control for speech.
However, using MVPA, we found that the pattern of activity in Heschl's gyrus (HG) could easily distinguish clear speech from rotated speech (it is responding to these conditions differently). In fact, HG could distinguish each condition from the other, including the within intelligibility contrasts such as clear vs. noise vocoded (both intelligible) and rotated vs. rotated noise vocoded (both unintelligible). It appears that HG is sensitive to the acoustic variation between our conditions. The figure below shows classification accuracy for the various MVPA contrasts in left and right HG. The dark black line indicates chance performance (50%) whereas the thinner line indicates the upper bound of the 95% confidence interval determined via a bootstrapping method.
Again this highlights the fact that standard GLM analyses obscure a lot of information that is contained in those areas that appear to be insensitive the manipulations we impose.
So what about the STS? Here we defined ROIs in each subject using the clear minus rotated condition, i.e., the conditions that showed no difference in average amplitude in HG. ROIs where anatomically categorized in each subject as being "anterior" (anterior to HG), "middle" (lateral to HG), or "posterior" (posterior to HG). In a majority of subjects, we found peaks in anterior and posterior STS in the left hemisphere (but not in the mid STS), and peaks in the anterior, middle, and posterior STS in the right hemisphere. ROIs were defined using half of our data, MVPA was performed using the other half -- this ensured complete statistical independence.
Here are the classification accuracy graphs for each of the ROIs. The left two bars in each graph show across-intelligibility contrasts (clear vs. rotated & noise vocoded vs. rotated NV). These comparisons should classify if the area is sensitive to the difference in intelligibility. The right two bars show within-intelligibility contrasts (clear vs. NV, both intell; rot vs. rotNV, both unintell). These comparisons should NOT classify if the ROI is acoustically invariant.
Looking first at the left hemisphere ROIs, notice that both anterior and posterior regions classify the across intelligibility contrasts (as expected). But the anterior ROI also classifies clear vs. noise vocoded, two intelligible conditions. The posterior ROI does not classify either of the within intelligibility contrasts. This suggests that the posterior ROI is the more acoustically invariant region.
The right hemisphere shows a different pattern in this analysis. The right anterior ROI shows a pattern that is acoustically invariant whereas the mid and posterior ROIs classify everything, every which way, more like HG.
If you look at the overall pattern within the graphs across areas you'll notice a problem with the above characterization of the data. It categorizes a contrast as classifying or not and doesn't take into account the magnitude of the effects. For example, notice that as one moves from aSTS to mSTS in the right hemisphere, classification accuracy for the across intelligibility contrasts rises (as it does in the left hemi), and that in the right aSTS clear vs. NV just misses significance, where as in the mSTS clear vs. NV barely passes significance. We may be dealing with thresholding effects. This suggests that we need a better way of characterizing acoustic invariance that uses all of the data.
So what we did is calculate an "acoustic invariance index" which basically measures the magnitude of the intelligibility effect (left two bars compared with right two bars). This difference should be large if an area is coding features relevant to intelligibility. This measure was then corrected by the "acoustic effect" (the sum of the absolute difference in classification accuracy within intelligibility conditions). When you do this, here is what you get (acoustic invariance = positive values, range -1 to 1):
HG is the most sensitive to acoustic variation across conditions and more posterior areas (pSTS in left, mSTS in right) are the least sensitive to acoustic variation. aSTS fall in between these extremes. So left pSTS and right mSTS as we've defined it anatomically appear to be functionally homologous and represent the top of the auditory hierarchy for phoneme-level processing. I don't know what is going on in right pSTS.
What features are these areas sensitive to? My guess is that HG is sensitive to any number of acoustic features within the signals, aSTS is sensitive to suprasegmental prosodic features, and pSTS is sensitive to phoneme level features. Arguments for these ideas are provided in the manuscript.
Okada, K., Rong, F., Venezia, J., Matchin, W., Hsieh, I., Saberi, K., Serences, J., & Hickok, G. (2010). Hierarchical Organization of Human Auditory Cortex: Evidence from Acoustic Invariance in the Response to Intelligible Speech Cerebral Cortex DOI: 10.1093/cercor/bhp318
Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language. Cognition, 92, 67-99.
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nat Rev Neurosci, 8(5), 393-402.
Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat Neurosci, 12(6), 718-724.
Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. S. (2000). Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406.