Thursday, June 28, 2007

Speech intelligibility, syllables, and phase of cortical rhythms: New paper in Neuron

Run, don't walk, to your nearest library or computer terminal!

A new paper by Huan Luo and me just appeared in Neuron. Huan was a graduate student in the University of Maryland Neuroscience and Cognitive Science program and worked principally with me and Jonathan Simon. She finished her Ph.D in 2006, and is now living in Beijing with her husband and two children. Yes, she was ridiculously productive in graduate school ...

This paper shows (IMHO) compelling evidence (based on single trial MEG data) that speech is analyzed using a ~200 ms window.

Luo, H. & Poeppel, D. (2007). Phase Patterns of Neuronal Responses Reliably Discriminate Speech in Human Auditory Cortex. Neuron 54, 1001-1010.

How natural speech is represented in the auditory cortex constitutes a major challenge for cognitive neuroscience. Although many single-unit and neuroimaging studies have yielded valuable insights about the processing of speech and matched complex sounds, the mechanisms underlying the analysis of speech dynamics in human auditory cortex remain largely unknown. Here, we show that the phase pattern of theta band (4–8 Hz) responses recorded from human auditory cortex with magnetoencephalography (MEG) reliably tracks and discriminates spoken sentences and that this discrimination ability is correlated with speech intelligibility. The findings suggest that an ∼200 ms temporal window (period of theta oscillation) segments the incoming speech signal, resetting and sliding to track speech dynamics. This hypothesized mechanism for cortical speech analysis is based on the stimulus-induced modulation of inherent cortical rhythms and provides further evidence implicating the syllable as a computational primitive for the representation of spoken language.

3 comments:

Greg Hickok said...

Nice job David. I LOVE syllables. In fact, I think that a syllable scale analysis of the speech signal is doing most of the work in normal comprehension. So we now have decent evidence that syllables are an important unit of analysis in speech recognition. Can someone lay out a convincing argument that sub-syllabic units are also critically involved in speech recognition (i.e., comprehension)? Remember: no fair using data from tasks that require explicit attention to phonemes.

Anonymous said...

Good to finally see the paper connected to the talk I heard so many times. I did have one question, though. You mention that correct classification of sentence category by the theta band response begins to emerge around 2000 msc from the beginning of sentence stimuli onset. Do you have any ideas about what that might mean? Or even not the specific time, but the fact that it does emerge fairly gradually?

David Poeppel said...

geez, greg, no fair, why do you have to ask such a hard question? on a weekend?? in the summer???

i'm glad you love syllables. i love my wife and children, barbecue pork, and the poetry of ringelnatz -- but i am also very fond of syllables. and i think that position aligns us very much with steve greenberg (http://www.silicon-speech.com/), who has been advocating for a more explcitly syllable-centric perspective for years. it looks like syllabic-level parsing or processing occupies some position of primacy in comprehension of ecological speech. i believe that christophe pallier argued for this a long time ago (his thesis maybe?), and jacques mehler and his colleagues have argued for the epistemological priority of syllables in acquisition, presumably because a language's rhythm class (a key concept for mehler) is so strongly conditioned by syllable structure.

but does that mean we can do without sub-syllabic (i.e. segmental or featural) processes in recognition? i think not. first, there are many online effects (prospective and retrospective) that build on segmental or featural information, say assimilation effects, coarticulatory effects. if you *only* look at syllabic level information, we would be forced to account for all of those effects as 'late'. [by the way, in this context, we had a good visit last week here at "TalkingBrains East" by david gow from harvard/mgh/salem state, who has done important work on issues such as feature parsing and is now using MEG to test hypothesis in this domain. stay on the lookout for gow-stuff coming out.] second, since we know that lexical representation is subsyllabic, we need to get to the information sooner or later. so is it going to be later (parse syllables first, fill in with segments later), earlier (the standard model: parse small things first, build bigger things like syllables, keep on building), or concurrent (my position, i.e. simultaneous analysis at two temporal granularities)? i think the last hypothesis -- stolen from engineering as well as vision research -- is the most tasty.

phonological generalizations occur, for the most part, over features and segments, and so you must have the information available eventually. why not make it available right away? having access to (sub)segmental evidence *and* syllabic evidence might account for why recognition is so damn fast. a multi-time resolutuion model predicts that gamma-band activity and theta-band activity in the auditory cortices should be tightly co-modulated during spoken language comprehension. let's test ...

OK, i'm running out of steam and need to take my youngest son swimming ... more on this eventually.