I just finished reading an excellent review article by Joseph Perkell, titled Movement goals and feedback and feedforward control mechanisms in speech production. If you want a nice survey of behavioral speech production research from the motor control perspective (as opposed to the psycholinguistic perspective), this should definitely be on your reading list.
In the review, Perkell argues a few different points. One is that the goals, or targets, of speech production are sensory. I agree completely. Another is that there are two kinds of sensory targets, auditory and somatosensory. Again, I agree completely. He makes some interesting observations regarding differences between vowels and consonants, suggesting that the targets for vowels are predominantly auditory whereas the targets for most consonants are largely somatosensory. I kind of agree with this one. An alternative way of stating this generalization might be that the auditory system is more interested in syllables, vowels being syllabic units all on their own, and the somato system is more interested in sub-syllabic units, i.e., consonants, particularly stops. I'm working on a version of this general idea in a forthcoming pub. Stay tuned...
Related to the auditory goal point, Perkell reviews an interesting body of data suggesting that one's auditory acuity for a particular phonemic contrast is correlated with the sharpness of one's own articulation for that contrast. Cool stuff.
Here's the one thing I disagree with in the whole paper. Perkell writes, "it is widely believed that once speech is acquired and has matured, it operates almost entirely under feedforward control". This is an assumption of the DIVA model promoted by the Guenther/Perkell group. I like the DIVA model, but I think it is wrong in this (and a couple other) respects. The idea of feedforward control is that the system learns, via overt feedback and correction mechanisms, the motor routines necessary for hitting sensory targets. Once learned, speech production involves activating these motor programs. If something goes wrong, the only way to catch it is via overt feedback. In other words, there is no internal forward prediction/correction mechanism.
There is a simple argument against this position: conduction aphasia. Conduction aphasics have nothing wrong with their auditory targets. They have normal speech perception and can readily detect errors in their own speech. They do not have a motor articulatory problem either. Much of their speech is fluent and accurate. However, they make phonemic errors more often than control subjects do. A natural explanation of this is that conduction aphasics have a damaged internal correction mechanism (Hickok et al. 2011). They can activate the learned motor programs, they can activate the auditory targets, but if something goes wrong in the motor programming, they can't generate an internal forward prediction and correct the error before it is spoken, thus their speech error rate goes up relative to individuals with an intact system. This is one aspect of the DIVA model that needs to be updated.
Perkell, J. (2010). Movement goals and feedback and feedforward control mechanisms in speech production Journal of Neurolinguistics DOI: 10.1016/j.jneuroling.2010.02.011
Hickok G, Houde J, & Rong F (2011). Sensorimotor integration in speech processing: computational basis and neural organization. Neuron, 69 (3), 407-22 PMID: 21315253