Wednesday, February 11, 2009

Area Spt in the planum temporale region: Sensory-motor integration or auditory imagery?

Area Spt is one of my favorite brain locations. We've been working on characterizing its response properties ever since we reported its existence in two papers back in 2001 and 2003 (Buchsbaum, Hickok, and Humphries, 2001; Hickok, Buchsbaum, Humphries, & Muftuler, 2003). Spt is located in the left posterior Sylvian region at the parietal-temporal boundary. The defining feature of Spt is that it activates both during the perception of speech and during (covert) speech production. Subsequent work has found that Spt is not speech-specific as it responds also during tonal melodic perception and production (humming), and that Spt is relatively selective for vocal tract gestures in that it responds more during perception and covert humming of tonal melodies than during perception and imagined piano playing of tonal melodies (Pa & Hickok, 2008) . On the basis of evidence like this, I have argued that Spt is a sensory-motor integration area for the vocal tract motor effector, just like monkey area LIP is a sensory-motor integration area for the eyes, and the parietal reach region (or area AIP) is a sensory-motor area for the manual effectors.

One nagging objection that has been raised more than once is this: "Isn't your 'motor' activity just auditory imagery?" That is, maybe during covert rehearsal there is some kind of motor-to-sensory discharge that serves to keep active sensory representations of speech in auditory cortex (i.e., Spt). Another possible objection that is less often raised is that the "sensory" activity we see in Spt isn't really sensory but is really motor preparation.

Just yesterday we got a paper accepted in the Journal of Neurophysiology that I think rules out these kinds of objections (Hickok, Okada, & Serences, in press). Here's the logic. If Spt really is just like other sensory-motor integration areas (e.g., LIP, AIP), it will be composed of a population of sensory-weighted cells, motor-weighted cells, and truly sensory-motor cells. Two things follow, (i) the BOLD response to combined sensory-motor stimulation should be greater than the BOLD response to either sensory or motor activation alone (because sensory-motor stimulation activates a larger cohort of cells than either sensory or motor alone), and (ii) the pattern of activity within Spt may be different during sensory activation than during motor activation (on the assumption that sensory and motor weighted cells are not perfectly distributed across the sampled voxels within Spt. If we can show that the response to sensory stimulation and motor stimulation are different, then Spt activity can't be all sensory or all motor; it must be sensory-motor.

Here's how we tested these predictions using fMRI. Subjects either listened to a 15s block of continuous speech (continuous listen), listened to 3s of speech and then rested for 12s (listen+rest), or listened to 3s of speech and then covertly rehearsed that speech for 12s (listen+rehearse):

First the BOLD results. Spt was identified separately in each subject by the subtraction, listen+rehearse minus listen+rest. This picks out areas that are more active during rehearsal than rest. Here's the BOLD activation for each condition in each subject's Spt ROI:

In the listen+rehearse condition, we predict that the BOLD response will be dominated by sensory stimulation during the first phase of the trial, will be a mix of sensory and rehearsal response during the middle phase of the trial (because the sensory response hasn't yet decayed while the rehearse response starts kicking in), and then will be dominated by the rehearsal response during the final phase of the trial. If you look at the listen+rehearse response curve compared to the continuous listen curve you can see how this prediction is born out: responses are equal in the first phase (because both conditions involve identical sensory stimulation), then activity in the continuous listen condition saturates and maintains roughly the same activity level until the end of the trial whereas activity in listen+rehearse continues to elevate (presumably because the sensory and motor-rehearsal responses are summing) then falls back down toward the end of the trial (presumably because the sensory signal is decayed). So, the BOLD predictions pan out.

Next we used pattern classification analysis to see if the pattern of the response in Spt was different during sensory stimulation versus motor activation. Amplitude information was removed from the data for this analysis. We trained a Support Vector Machine to classify the two conditions (continuous listen vs. listen+rehearse) on data from all but one run then tested its classification accuracy in the remaining run. This hold-one-out procedure was repeated until all runs had been classified. In addition, we performed this classification in three different time bins within the trial: early, middle, and late. The prediction is that classification accuracy should be maximal when the two conditions are maximally dominated by different signal sources, i.e., in the final time bin, and should be no better than chance in the first time bin when both signals are predominantly sensory. Here's what we found (blue lines represent upper and lower 5% boundaries for classification accuracies determined via a permutation test):

Classification accuracy for the continuous listen vs. listen+rehearse conditions was significantly above chance only in the last time bin, that is when the BOLD signal was maximally dominated by distinct input sources, one sensory the other motor. Notice too that at this time point the BOLD amplitude in these two signal are the same, which provides additional evidence that classification accuracy has nothing to do with amplitude.

If the pattern of activity in Spt is different during sensory stimulation compared to during motor stimulation (and independent of amplitude), then Spt activity can't be all sensory or all motor. This, together with the range of supporting evidence indicates that Spt is indeed a sensory-motor area.


Buchsbaum B, Hickok G, and Humphries C. Role of Left Posterior Superior Temporal Gyrus in Phonological Processing for Speech Perception and Production. Cognitive Science 25: 663-678, 2001.

Hickok G, Buchsbaum B, Humphries C, and Muftuler T. Auditory-motor interaction revealed by fMRI: Speech, music, and working memory in area Spt. Journal of Cognitive Neuroscience 15: 673-682, 2003.

Hickok, G., Okada, K., & Serences, J. (in press). Area Spt in the human planum temporale supports sensory-motor integration for the speech processing. Journal of Neurophysiology

Pa J, and Hickok G. A parietal-temporal sensory-motor integration area for the human vocal tract: Evidence from an fMRI study of skilled musicians. Neuropsychologia 46: 362-368, 2008.


Anonymous said...

Thanks for the posting. Very Interesting. But:

1) Could the higher BOLD during rehearse be explained by more effortful processing then during listening? What is the response profile in other areas?

2) How can you say that the classification accuracy is "independent of amplitude"? Did you actually remove the voxel-by-voxel condition-mean response before using SVM? How? Even if the average between conditions is below significance at univariate testing (and at 12 s this is not the same), SVMs also integrate weak but consistent differences in signal amplitude...

Greg Hickok said...

1) No. Because if it was just effort then you'd expect the amplitude in the listen+rehearse condition to remain elevated through the end of the trial but it doesn't, it drops back down to equal the listen condition.

2. Yes. The data were normalized (z-transform) on a run by run basis before SVM. As I pointed out in the post, you can see that amplitude is not the determining factor: in the middle of the trial the amplitude difference is greatest yet classification accuracy is not significantly above chance; accuracy is greater than chance only in the last time period when there is no amplitude difference.

tom said...

This looks like a convincing paper. The plots you show are wonderful. I think it would be even better if you could show plots for a few other ROIs, ones that might be thought to be involved in auditory imagery, rehearsal or response preparation (e.g. AI, left STG, dlPFC/Insula/vPM). If they show the profiles that would be predicted by your line of argument it would be very impressive. Any chance of posting them here?

Greg Hickok said...

A reviewer suggested the same thing. It seems like a good idea at first, but in fact it's not that informative because you don't need pattern classification to show that say A1 shows differential responses to sensory and motor events: it's clear from the amplitude data! Where pattern classification is useful, is when the amplitude in an area is the same in two conditions. Then you can see if the signal in the area is being driven by different sources.

Notice too that our analysis as a function of timepoint within the trail provides an internal, i.e., within ROI, control. In addition, we looked at classification accuracy in Spt in the continuous listen vs. the listen+rest conditions (it's in the paper) -- it was not significantly better than chance at any timepoint.

tom said...

Sorry, Greg, I wasn't clear. I meant plots of the BOLD amplitude over time for these other regions.

Peter said...

Very interesting data. One thing confuses me a little though. The BOLD response at 9 seconds is identical for listen+rest and continuous listen. This implies that the BOLD response at 9 seconds still mainly reflects the neural activity from the first 3 seconds, when these two conditions are the same. But the maximal difference between listen+rehearse and continuous listen also happens at 9 seconds. Could it be that subjects in the listen+rehearse task are already rehearsing while they're listening in the first 3 seconds?

Greg Hickok said...

Yes, that is my guess. Subjects know they will be rehearsing the stims (the cue comes on at the beginning of the block), so they start before the speech stimulus ends.

Brad Buchsbaum said...

This is a creative use of pattern classification. well done!

two things to think about. First, using the listen+rehearse > listen-only contrast to localize Spt is biasing towards finding motor-weighted responses. Typically we have identified Spt with the conjunction listen AND rehearse, which is unbiased with respect to the sensory/motor dimension. Time courses of Spt activation, when isolated in this manner, usually show a higher peak during the sensory phase than during the "rehearsal" phase. This pattern is not evident in your data where you see greater activity for rehearsal than continuous sensory stimulation. This may be because you have identified an ROI that is not strongly sensitive to sensory input due to your localizer contrast. I would be curious to see what the time courses (and spatial distribution of "Spt") would look like if you used the conjunction localizer instead.

Greg Hickok said...

Hi Brad,
Thanks for catching this. I actually wasn't accurate in my description of the ROI definition in the blog entry. Here's the quote from the paper describing what we actually did:

"in individual subjects, ROIs were defined by (i) activations reflecting the conjunction of continuous speech > null rest blocks, and speech+rehearse > speech+rest that (ii) were located within the left planum temporale region (within the Sylvian fissure posterior to Heschl’s gyrus), defined by coregistering each subject’s activation maps with their own structural MRIs."

So we did use the conjunction as you (and we) have always done. So why *more* activation for rehearse than listen-only? First, I don't think there is actually more activation for rehearsal than for listen. I think that Spt activates roughly equally to these two processes. It looks like more activation for rehearse than listen in the graph I posted because both processes are summing in the listen+rehearse condition in the middle phase of the trial; notice that rehearse and listen are equal by the end of the trial.

You pointed out that you usually see *greater* activity in the listen than the rehearse phase of a trial in Spt. I agree, but I don't think this is inconsistent with what we found in the present study. Have a look back at our Figure 3A from Hickok, Buchsbaum, et al. (2003). This graph shows Spt activity in a listen+rehearse condition as well as a listen+rest condition. Notice that the first peak "sensory" activation is greater in the listen+rehearse condition than listen+rest. I think this is because some of the rehearse response is already mixing in (summing) with the listen response. Notice too that the peak of the sensory response in the listen+rest condition is similar to the rehearse amplitude *at 12-15s post trial onset* (which is where the trial ended in the present experiment).

What do you think?