There are two new papers on the neurophysiological correlates of speech and language processing that are quite interesting. They are closely related to each other and are fun to read (and discuss) as a pair. Both compare the responses to intelligible versus unintelligible speech using neuronal oscillations as the metric. One group focuses on the gamma band, one on the theta band. Both papers do a terrific job motivating the study, and both show some nice analyses.
One paper is by Marcela Peña and Lucia Melloni and just appeared in the Journal of Cognitive Neurocience. Brain Oscillations during Spoken Sentence Processing, May 2012, Vol. 24, No. 5, Pages 1149-1164.
Marcela and Lucia used high-density EEG and employed a cross-linguistic design. They recorded from Spanish and Italian participants while they were listening to Spanish, Italian, or Japanese. The study derives from the perspective of 'binding by synchrony,' a position that continues to receive a lot of attention in systems and cognitive neuroscience - but is not yet as widely investigated in speech/language studies. The assumption is that when listening to a language that the listener understands (i.e. there is comprehension at the sublexical, lexical, syntactic, semantic levels), whatever neural signal reflects 'binding' across the populations that need to be coordinated will be enhanced in the intelligible conditions (i.e. Spanish for Spanish speakers, Italian for Italian speakers). What they observe is that the gamma band is selectively enhanced during the sentence when it is comprehended. (Their figures 1 and 5 tell the whole story.) They conclude that the low-frequency, theta activity tracks lower-level information, the (lower) gamma band reflects what happens in intelligible speech, i.e. binding of higher level representations. Overall, this supports a binding-by-synchrony style view for language processing.
And a slightly different perspective/conclusion ...
The other paper is by Jonathan Peelle, Joachim Gross, and Matt Davis and is in Cerebral Cortex. Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension, doi:
10.1093/cercor/bhs118.
Jonathan, Joachim, and Matt used MEG and presented listeners with vocoded speech that was either intelligible (16 channels), partially intelligible (4 channels), or unintelligible (1 channel). They also presented a 4 channel unintelligible condition (spectrally rotated). They calculate a quantity they call 'cerebro-acoustic coherence', used to quantify the relation between the envelopes of the stimuli and the low-frequency (4-7 Hz) neural response. They show that when a sentence is intelligible, the coherence is systematically higher. (Their figures 1 and 4 pretty much tell the story.) Of special interest is their observation that there is an MTG-centered, left lateralized activation when comparing 4 channel intelligible versus unintelligible stimuli. This adds further support to the key role MTG plays for (lexically mediated) intelligibility. Moreover, their data challenge what some of my collaborators and I have argued (theta tracking is acoustic; e.g. Howard & Poeppel 2010, 2012 etc.)
A little whining, some small regrets ... There are three things I would like to hear about from Marcela and Lucia. (i) Why not analyze the low frequency response components in more detail? (ii) Why not look at the phase, and focus solely on power? (iii) Why did the gamma band response in the intelligible conditions not start till 1000 ms after the sentence has started? Presumably the first second of a sentence is also understood ... And from Jonathan, Joachim and Matt, I would have liked to know (i) Why no analyses of the higher frequencies, e.g. the low gamma band? (ii) Why no analyses of power? (iii) Why are the behavioral data for 4 channels (fig 1E) so different from the rest of the literature using such materials (Shannon, Drullman etc.)?
Notwithstanding a little complaining, these are very cool papers! So, if we could have these two articles date, and have them generate a paper-offspring, baby paper, I could imagine seeing some interesting alignments between theta and gamma that reflect intelligibility. Maybe we need both regimes of neuronal oscillations to generate usable representations ....
3 comments:
hi David,
All of your comments are very well taken! Some quick responses:
1) Regarding the frequencies in which we looked for cerebro-acoustic coherence, our goal was to look specifically at theta oscillations associated with the dominant acoustic components of the speech signal, which lined up well with the largest overall phase-locking we saw in the MEG data (our Figure 2). Because we didn't see any hints of significant phase locking at higher frequencies, we didn't explore these in detail, but I agree it is a good idea.
2) Power analyses to complement our phase analyses have been on our list for quite some time now but did not make it into the paper. These seemed less urgent to me because of the nice work from other groups (cough Luo & Poeppel 2007 cough) which was fairly convincing in showing the importance of phase (not power) in the responses we were looking at, along with some of the really nice work in nonhuman primates (Schroeder, Lakatos, et al.). Nevertheless, I completely agree that these are a sensible thing to look at and would be helpful.
3) In fact, I don't think the behavioral data for the 4 channel vocoded condition are odd at all. As you know the intelligibility of vocoded speech depends on numerous factors including the number of channels, their spacing, the frequency range, SNR, amount of exposure, etc., not to mention various digital signal processing minutia (envelope filter frequency, parameters of the filters, etc.). In our hands we found ~ 30% correct word report for 4 channel vocoded sentences. This is not all that different from a little under 20% correct in Davis & Johnsrude (2003). Shannon et al. (1995) show much higher performance (Fig. 2), but their upper frequency was 4 kHz (as opposed to our 8 kHz), meaning that there was significantly more spectral detail under 4 kHz. (Not to mention the large amount of training listeners received.) I could go on, but I think we're actually not all that different than the rest of the literature.
For what it's worth I'm pretty sure our paper is single and open to dating. I agree that multiple frequencies of oscillations getting together seem like a good idea.
References:
Davis MH, Johnsrude IS (2003) Hierarchical processing in spoken language comprehension. J Neurosci 23:3423-3431.
Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270:303-304.
Thanks for the summary of these papers David. Now I don't have to read them!
You say "binding-by-synchrony". Do you really think synchrony is causing the binding? Or is it a consequence of the binding? Does it even matter?
I normally just think of oscillation synchrony as a reflection of the fact that networks are talking to each other, just like say a negative deflection is an ERP reflection of activation of auditory cortex to an acoustic event. We don't say "auditory activation by negative deflection". Is there more to synchrony? What's the evidence?
Greg: your question is fair (if not new), but I am not the right person to answer it. Actually Lucia Melloni, the coauthor of one of the papers, has thought and written about this a great deal - and maybe we could persuade her to summarize some of the main issues for us. (Lucia: what do you say?)
There are a bunch of interesting papers and reviews, and of these I am especially partial to the work of Pascal Fries.
If you *really* want to dig into this stuff, a good starting point is a special issue of Neuron from 1999 called, creatively, The Binding Problem.
I think you are right to wonder - I certainly wonder about the logic of the issue - but the neural responses we can measure in the context of these studies (often oscillations) are certainly useful, regardless of their ultimate interpretation.
Post a Comment