Friday, March 30, 2012

How does visual speech modulate auditory speech perception?

There are two current views.  One is that visual speech provides cues as to the motor gestures that generated the speech sounds and this motor information generates an efference copy that modulates auditory speech.  

AV speech elicits in the listener a motor plan for the production of the phoneme that the speaker might have been attempting to produce, and that feedback in the form of efference copy from the motor system ultimately influences the phonetic interpretation.  -Skipper et al. 2007
The other is that AV integration is achieved without the motor system, via cross-sensory integration in the STS (Nath & Beauchamp, 2012).

 I came across a 15-year-old study recently that make a pretty strong case against the motor-based account.  Rosenblum et al. (1997) decided to assess whether individuals who do not know how to produce speech, nonetheless show a McGurk effect.  Their study population?  5-month-old infants.  The paradigm? Habituation of looking time (present the same thing over and over and see how long it takes the kid to get bored and stop looking).  Basic result from four experiments?  Habituation to auditory syllables was modulated by visual speech information: pre lingual infants show a McGurk effect.  

AV integration seems to be primarily sensory-, not motor-driven.

Nath, A.R. and M.S. Beauchamp, A neural basis for interindividual differences in the McGurk effect, a multi sensory speech illusion. Neuroimage, 2012. 59(1): p. 781-7.

Rosenblum, L.D., M.A. Schmuckler, and J.A. Johnson, The McGurk effect in infants. Percept Psychophys, 1997. 59(3): p. 347-57.

Skipper, J.I., et al., Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception. Cereb Cortex, 2007. 17(10): p. 2387-99.


VilemKodytek said...

Greg, if we take into account the rule # 1, both theories may be correct.

Greg Hickok said...

Yes, in fact that is what Kai Okada and I proposed a couple years ago. I've come think, however, that we were wrong, that the motor-based mechanism isn't doing much.

Okada, K. and G. Hickok, Two cortical mechanisms support the integration of visual and auditory speech: A hypothesis and preliminary data. Neurosci Lett, 2009. 452(3): p. 219-23.

VilemKodytek said...

Surely not in infants. Note, however, that Rosenblum et al.’s finding doesn’t necessarily mean they worked with future McGurk-effect perceivers. Incongruent adio-visual syllables look strange regardless you perceive /da/ or not, I suppose.

In my view it’s next to impossible that the hypothesis in Okada & Hickok (2009) be wrong. Under which conditions one or the other pathway or both are critical is another question.

Greg Hickok said...

What's the evidence that the motor system is involved?

VilemKodytek said...

Good question. I don't have evidence, just guess. Even in reading, which is "less natural" than visual speech, there are two pathways and the choice may be strategic in experienced readers.

Is there any counterevidence?

Greg Hickok said...

Motor area (e.g., Broca's) activity doesn't seem to correlate with AV fusion, but STS does. This paper is one example:

Miller, L.M. and M. D'Esposito, Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J Neurosci, 2005. 25(25): p. 5884-93.

Infants seem to be sensitive the AV fusion without much of a motor speech system.

It's a bit of a mixed bag if you look at a range of studies, but the weight of the evidence suggests a very minimal role for the motor system. What we are seeing in our own experiments seem to bear this out.

VilemKodytek said...

As far as I can understand, both STS and IFG seem to correlate with AV fusion in Miller & D'Esposito.

William Matchin said...

The Miller & D'Esposito study showed that activity in the IFG is negatively correlated with fusion - that is, more activity for trials in which subjects reported that stimuli were not fused. The reverse pattern held true for the STS.

Anonymous said...

Yes, that’s true. However, I didn’t argue against the claim that “AV integration seems to be primarily sensory-, not motor-driven.” Let’s recall Okada and Hickok’s (2009) conclusion:

“We suggest that visual speech activates both a sensory-sensory integration network in the STS and a sensory-motor network including frontal motor-speech regions… We further propose that both of these networks provide independent sources of constraint on the analysis of acoustic speech input, although the cross-sensory system appears to be the most influential.”

If it’s false, then the sensory-motor network adds nothing, at most it rehearses what’s already available in STS. Is there ground for such a strong refusal?


Greg Hickok said...

The view I'm promoting now is based on the following:

1. The lack of a hemodynamic response in motor speech areas that reflect audiovisual integration.

2. The fact that you don't seem to need a motor speech system to get AV fusion (e.g., pre lingual infants).

3. Several unpublished results in my lab showing that articulatory rehearsal does not modulate the McGurk effect and that damage to motor speech areas does not preclude AV fusion/McGurk effects.

I think the burden is on the opposing viewpoint now. What evidence IS there that AV integration involves motor speech areas?

Anonymous said...

I agree the burden is on the other side. I’m just interested.

1 Hemodynamic response is slow and may represent a superposition of a couple of neural processes. Moreover, there is response in IFG in some papers, eg. Ojanen et al. 2005, NeuroImage 25, 333. Even in Miller & D’Esposito, there seems to be some trend to increase at the first few seconds.

2 OK.

3 I'm looking forward to reading it.