Tuesday, December 14, 2010

More on intelligibility: Guest post from Jonathan Peelle

Guest post from Jonathan Peelle:

There were certainly a lot of interesting topics that came up at the SfN nanosymposium, which goes to show that I think we should do this sort of thing more often.

The study of intelligible speech has a long history in neuroimaging. On the one hand, as Greg and others have emphasized, it is a tricky thing to study, because a number of linguistic (and often acoustic) factors are confounded when looking at intelligible > unintelligible contrasts. So once we identify intelligibility-responsive areas, we still have a lot of work to do in order to relate anatomy to cognitive operations involved in speech comprehension. That being said, it does seem like a good place to start, and a reasonable way to try to dissociate language-related processing from auditory/acoustic processing. Depending on the approach used, intelligibility studies can also tell us a great deal about speech comprehension under challenging conditions (e.g. background noise, cochlear implants, hearing loss) that have both theoretical and practical relevance.

One thing I suspect everyone agrees on is that, at the end of the day, we should be able to account for multiple sources of evidence: lesion, PET, fMRI, EEG/MEG, as well as various effects of stimuli and analysis approach. With that in mind, there are a few comments to add to this discussion.

Regarding Okada et al. (2010), I won’t repeat all the points we have made previously (Peelle et al., 2010a), but the influence of background noise (continuous scanning) shouldn’t be underestimated. If background noise simply increases global brain signal (i.e. an increase in gain), it shouldn’t have impacted the results. But background noise can interact with behavioral factors, and results in spatially constrained patterns of univariate signal increase (including left temporal cortex, e.g. Peelle et al. 2010b):

So, in the absence of data I am reluctant to assume that background noise and listening effort wouldn’t affect multivariate results. This goes along with the point that even if two types of stimuli are intelligible, they can differ in listening effort, which is going to impact the neural systems engaged in comprehension. In Okada et al. (2010), this means that a region that distinguishes between the clear and vocoded conditions might be showing acoustic sensitivity (the argument made by Okada et al.), or it may instead be indexing listening effort.

Another point worth emphasizing is that although the materials introduced by Scott et al. (2000) have many advantages and have been used in a number of papers, there are a number of ways to investigate intelligibility responses, and we should be careful not to conclude too much from a single approach. As we have pointed out, Davis and Johnsrude (2003) parametrically varied intelligibility within three types of acoustic degradation, and found regions of acoustic insensitivity both posterior and anterior to primary auditory areas in the left hemisphere, and anterior to primary auditory cortex in the right hemisphere.

One advantage to this approach is that parametrically varying speech clarity may give a more sensitive way to assess intelligibility responses than a dichotomous “intelligible > unintelligible” contrast. The larger point is that multivariate analyses, although extremely useful, are not a magic bullet; we also need to carefully consider the particular stimuli and task used (which I would argue also includes background noise).

Incidentally, in Davis and Johnsrude (2003), responses that are increased when speech is distorted (aka listening effort) look like this (i.e. including regions of temporal cortex):

The role of inferotemporal cortex in speech comprehension

One side point which came up in discussion at the symposium was the role of posterior inferior temporal gyrus / fusiform, which appears in the Hickok & Poeppel model; I think the initial point was that this is not consistently seen in functional imaging studies, to which Greg replied that the primary support for that region was lesion data. It’s true that this region of inferotemporal cortex isn’t always discussed in functional imaging studies, but it actually occurs quite often—often enough that I would say the functional imaging evidence for its importance is rather strong. We review some of this evidence briefly in Peelle et al. (2010b; p. 1416, bottom), but it includes the following studies:

Speaking of inferotemporal cortex, there is a nice peak here in the Okada et al. results (Figure 2, Table 1):

Once you start looking for it, it crops up rather often. (Although it’s also worth noting that the lack of results in this region in fMRI studies may be due to susceptibility artifacts in this region, rather than a lack of neural engagement.)

Anterior vs. Posterior: Words vs. Sentences?

With respect to the discussion about posterior vs. anterior temporal regions being critical for speech comprehension, it strikes me that we all need to be careful about terminology. I.e., does “speech” refer to connected speech (sentences) or single words? One explanation of the lesion data referred to in which a patient with severe left anterior temporal damage performed well on “speech perception” is that the task was auditory word comprehension. How did this patient do on sentence comprehension measures? I think a compelling case could be made that auditory word comprehension is largely bilateral and more posterior, but that in connected speech more anterior (and perhaps left-lateralized) regions become more critical (e.g., Humphries et al., 2006):

As far as I know, no one has done functional imaging of intelligibility of single words in the way that many have done with sentences; nor have there been sentence comprehension measures on patients with left anterior temporal lobe damage. So, at this point I think more work needs to be done before we can directly compare these sources of evidence.

Broadly though, I don’t know how productive it will be to specify which area responds “most” to intelligible speech. Given the variety of challenges which our auditory and language systems need to deal with, surely it comes down to a network of regions that are dynamically called into action depending on (acoustic and cognitive) task demands. This is why I think that we need to include regions of prefrontal, premotor, and inferotemporal cortex in these discussions, even if they don’t appear in every imaging contrast.


Awad M, Warren JE, Scott SK, Turkheimer FE, Wise RJS (2007) A common system for the comprehension and production of narrative speech. Journal of Neuroscience 27:11455-11464. http://dx.doi.org/10.1523/JNEUROSCI.5257-06.2007

Davis MH, Johnsrude IS (2003) Hierarchical processing in spoken language comprehension. Journal of Neuroscience 23: 3423-3431. http://www.jneurosci.org/cgi/content/abstract/23/8/3423

Humphries C, Binder JR, Medler DA, Liebenthal E (2006) Syntactic and semantic modulation of neural activity during auditory sentence comprehension. Journal of Cognitive Neuroscience 18:665-679. http://dx.doi.org/10.1162/jocn.2006.18.4.665

Okada K, Rong F, Venezia J, Matchin W, Hsieh I-H, Saberi K, Serences JT, Hickok G (2010) Hierarchical organization of human auditory cortex: Evidence from acoustic invariance in the response to intelligible speech. Cerebral Cortex 20:2486-2495. http://dx.doi.org/10.1093/cercor/bhp318

Orfanidou E, Marslen-Wilson WD, Davis MH (2006) Neural response suppression predicts repetition priming of spoken words and pseudowords. Journal of Cognitive Neuroscience 18:1237-1252. http://dx.doi.org/10.1162/jocn.2006.18.8.1237

Peelle JE, Johnsrude IS, Davis MH (2010a) Hierarchical processing for speech in human auditory cortex and beyond [Commentary on Okada et al. (2010)]. Frontiers in Human Neuroscience 4: 51. http://frontiersin.org/Human_Neuroscience/10.3389/fnhum.2010.00051/full

Peelle JE, Eason RJ, Schmitter S, Schwarzbauer C, Davis MH (2010b) Evaluating an acoustically quiet EPI sequence for use in fMRI studies of speech and auditory processing. NeuroImage 52: 1410–1419. http://dx.doi.org/10.1016/j.neuroimage.2010.05.015

Rodd JM, Davis MH, Johnsrude IS (2005) The neural mechanisms of speech comprehension: fMRI studies of semantic ambiguity. Cerebral Cortex 15:1261-1269. http://dx.doi.org/doi:10.1093/cercor/bhi009

Rodd JM, Longe OA, Randall B, Tyler LK (2010) The functional organisation of the fronto-temporal language system: Evidence from syntactic and semantic ambiguity. Neuropsychologia 48:1324-1335. http://dx.doi.org/10.1016/j.neuropsychologia.2009.12.035

Scott SK, Blank CC, Rosen S, Wise RJS (2000) Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123:2400-2406. http://dx.doi.org/10.1093/brain/123.12.2400


Greg Hickok said...

Thanks for this wonderful post Jonathan. Let's talk about some of the points you raise, starting with your opening assumption...

You state that identifying intelligibility-responsive areas is "a reasonable way to try to dissociate language-related processing from auditory/acoustic processing."

This is the main reason I'm not fond of intell vs. unintell contrasts: because I think they largely subtract out auditory/acoustic processing where a whole bunch of really important stuff is happening, especially for speech/phonological perception.

Here's some questions for discussion:

Is it possible that phonological information is processed/coded in terms of some high-level acoustic features?

If so, then is it possible that rotated speech, which contains many of the same features (by design) will also robustly activate phonological systems?

Note that this explains why anterior areas (the non-phonological regions) show the strongest response to intelligibility.

Doesn't it then follow that intelligibility contrasts miss a whole lot of what we are trying to understand in terms of speech perception?

I know you don't necessarily disagree re: my anterior-posterior claims, but I think it is worth clarifying exactly what processes *aren't* likely to be in the distribution of intelligibility responsive areas.

Jonathan Peelle said...

In general I'm sympathetic to this line of thought...I think we always have to be careful about what we are comparing, and that it could well be that the acoustic information preserved in unintelligible speech conditions activates speechy (phonological, say, but in principle other processes) regions of the brain. Intuitively I think this would be more of an issue for rotated normal speech (as opposed to rotated vocoded speech), but I also think that I would rather trust data than intuition. It's worth investigating.

At the risk of sounding like a broken record, I would also point out that there are multiple ways of constructing intelligibility manipulations. Of course, we need to be careful of potential confounds in all of them. But, for example, the intelligibility correlations in Davis & Johnsrude (2003) have rather robust activation in middle and posterior portions of temporal cortex (in addition to anterior regions). So at the risk of getting bogged down in specific definitions, I think there are different types of "intell vs. unintell" contrasts, and that the specifics of this probably matter.

That being said, I completely agree that it would help to clarify what processes are and aren't likely to be defined by comparing these conditions, and what types of acoustic/phonetic processes "unintelligible" (or less-intelligible) speech might engage.

Speaking of clarifying, regarding the anterior-posterior distinctions: do you think any of this can be resolved by specifying words/sentences? Any sense of how your patient with anterior temporal lobe damage does on sentence comprehension tasks?

Greg Hickok said...

So we seem to agree that despite the findings coming out of the Scott/Wise group, posterior regions may still likely be an important site for phonological processing. I agree that not all forms of unintelligible speech are equal, and the Davis and Johnsrude finding of posterior activation (similar to Okada et al.) is consistent with this. Sophie's group would argue, though, that the posterior activation only emerges when comprehension gets effortful...

Yes, comparing words and sentences is a good thing to do to help resolve the issue. We've run this study. We find that ATL areas are activated more to sentences than scrambled sentences each compared to their rotated baselines. However, scrambled sentences compared with rotated scrambled sentences still activate the ATL, just not as much as normal sentences. So depending on your perspective, you could either interpret this result as evidence for the ATL's role in sentence processing (sentences>scrambled sentences) or for the ATL's role in intelligible speech (scrambled sentences > rotated scrambled sentences).

As you note, lesion data may be more helpful in sorting things out. Our patient with ATL damage performed at ceiling on the syllable discrimination and word comprehension tasks, as you know. Sentence comprehension assessed with sentence-to-picture matching using semantically reversible active, passive, subject- and object-relatives was more problematic for this patient, with performance in the 85% range.

Greg Hickok said...

Regarding the argument that listening effort might explain the Okada et al. multivariate results, you said"

"In Okada et al. (2010), this means that a region that distinguishes between the clear and vocoded conditions might be showing acoustic sensitivity (the argument made by Okada et al.), or it may instead be indexing listening effort."

I will admit that this is a possibility. However, this account has to make the assumption that listening effort modulates not only the amplitude of the activation but the pattern across voxels. Further, and more importantly, this explanation would only apply to *anterior* temporal regions which significantly classified clear vs. vocoded speech. The posterior STS did not classify clear vs. vocoded speech but did classify clear vs. rotated, so listening effort doesn't seem to explain the pattern of results.

marcj said...

Concerning the inferotemporal region you mention: it strikes me that this activity might correspond to the word-selective mid-fusiform activation one observes in reading studies (what some would call a 'visual wordform area'). Without delving into the intricacies of the VWFA debate, it does seem to be the case that this region can also activate for auditory words as a result of automatic orthographic activation in literate individuals.

So for instance a recent paper, Desroches et al. (2010) looked at auditory rhyme detection in children with dyslexia vs. typical readers. They found inferior temporal/fusiform activation for the control kids, but not the kids with dyslexia.

The idea of obligatory activation during speech processing is not new. It dates back at the very least to work by Seidenberg & Tanenhaus (1979), who found that auditory rhyme judgments are influenced by orthographic congruency.

All that to say: there's a long list of ways in which intelligible speech can be different from unintelligible speech. I would add to that list the fact that intelligible speech (including phonetically valid nonwords) tends to activate orthographic representations.

Desroches, A.S., Cone, N.E., Bolger, D.J., Bitan, T., Burman, D.D., & Booth, J.R. (2010) Children with reading difficulties show differences in brain regions associated with orthographic processing during spoken language processing. Brain Research, 1356, 73-84

Seidenberg, M., Tanenhaus, M., 1979. Orthographic effects on rhyme monitoring. J. Exp. Psychol. Hum. Learn. Mem. 5 (6), 546–554.