Thursday, April 17, 2008

The Motor Theory of Speech Perception: Discussion summary

Quick summary of our in class discussion of "The Motor Theory of Speech Perception Reviewed" by Galantucci et al. (2006), Psychonomic Bull. & Rev., 13: 361-77.

1. Very worthwhile and thorough article. Everyone needs to read it.

2. The authors make an excellent case for the fact that there is a very tight connection between speech perception and speech production.

3. In contrast to the authors' conclusions however, none of the arguments make a case for the central claim of the Motor Theory, that "Perceiving speech is perceiving phonetic gestures." p. 367. Instead, I would argue that there is much better evidence for the reverse claim: That speech production is producing auditory targets. Call it the Perceptual Theory of Speech Production. (See Frank Guenther's work for computational implementations of this viewpoint.) This view has all the advantages of the Motor Theory in terms of maintaining parity between perception and production, and explaining the tight association between sensory and motor systems, AND, unlike the Motor Theory is consistent with the empirical facts from aphasia, namely that damage to frontal speech production systems does not lead to a concomitant impairment in speech recognition, whereas damage to posterior auditory-related brain regions does produce production deficits.

4. Relatedly, and as pointed out in a previous post, evidence from aphasia is glaringly lacking from the otherwise very thorough review.

I would love to have a discussion about #3, in particular, whether anyone can think of any evidence to support a Motor Theory account rather than a Sensory Theory account. I didn't see it in the Galantucci et al. paper, which is the best review I've seen, so speak up if you disagree! I would love to know why I'm wrong.

2 comments:

Bill Idsardi said...

Re: #3, some thought needs to be given about speech production in the post-lingually deaf. Perkell, Lane, Guenther, Matthies & co. have a few JASA articles about this (comparing patients pre- and post- cochlear implantation). What's striking in this context is how good the, for instance, VOT is PRE-implantation. They still have a contrast in VOT, but the boundary has drifted. So actual audition of one's own speech doesn't seem all that necessary online, but does serve as a kind of "tune up" to keep the parameters near their correct values. And indeed, the patients seem to move toward more usual values post-implantation.

Kenny said...

That was a fun class! What I took from CLASS (not the review) was that these are modular systems with complex domains that under certain circumstances (eg. McGurk effect, vs. altered acoustic feedback) can influence each other.

Motor information clearly can influence our phonological conclusions about acoustic input. While aphasias undermines the MT claim that phonological representations are *completely* motoric/gestural, the survival of speech perception without production does not mean that motoric information is not available to speech perception at all. On the other hand, the opposite extreme is incompatible with phonology and a good deal of behavioral data.

Some phonological processes involving features argue against Acoustic Theory (articulatory reps are *completely* acoustic). Two examples of articulation not constrained by acoustic info:

1. Phonetic Under-specification:
Some languages will use morphemes that specify some but not all crucial aspects of articulating the sound, thus has no specific acoustic target to match. These sounds borrow the Manner or Place of Articulation from their neighbor, clearly a motoric computation.
** If your underspecified [+nasal] sound exists in a language with /n,m,ng,etc./ then these distinctions normally are perceived and acoustic theory cannot explain how to ignore them in other cases.
** More importantly, AT cannot directly translate usually a Place or Manner of Articulation from a neighboring sound. The acoustic target must borrow a motor aspect, rather than having a simpler motor-feature continuation scheme.

2. Contrast neutralization: Ladder vs Latter, Rider vs Writer
Words can converge phonemically, because phonological processes converted the underlying phonemes into identical phonetic forms. During motoric preparation, two totally distinct underlying forms get turned into identical acoustic twins.
** If AT is true and articulatory representations are acoustic in nature, wouldn’t we predict this to never happen?

There are strong correlations between acoustic and motor speech events, so that Hebbian learning would predict functional connections. In contrast with both AT and MT, bi-directional flow is consistent with aphasia, phonology, and the experimental evidence presented in the MT review:
MOTOR <---> ACOUSTIC