Friday, June 10, 2011

Movement goals and feedback control in speech production

I just finished reading an excellent review article by Joseph Perkell, titled Movement goals and feedback and feedforward control mechanisms in speech production. If you want a nice survey of behavioral speech production research from the motor control perspective (as opposed to the psycholinguistic perspective), this should definitely be on your reading list.

In the review, Perkell argues a few different points. One is that the goals, or targets, of speech production are sensory. I agree completely. Another is that there are two kinds of sensory targets, auditory and somatosensory. Again, I agree completely. He makes some interesting observations regarding differences between vowels and consonants, suggesting that the targets for vowels are predominantly auditory whereas the targets for most consonants are largely somatosensory. I kind of agree with this one. An alternative way of stating this generalization might be that the auditory system is more interested in syllables, vowels being syllabic units all on their own, and the somato system is more interested in sub-syllabic units, i.e., consonants, particularly stops. I'm working on a version of this general idea in a forthcoming pub. Stay tuned...

Related to the auditory goal point, Perkell reviews an interesting body of data suggesting that one's auditory acuity for a particular phonemic contrast is correlated with the sharpness of one's own articulation for that contrast. Cool stuff.

Here's the one thing I disagree with in the whole paper. Perkell writes, "it is widely believed that once speech is acquired and has matured, it operates almost entirely under feedforward control". This is an assumption of the DIVA model promoted by the Guenther/Perkell group. I like the DIVA model, but I think it is wrong in this (and a couple other) respects. The idea of feedforward control is that the system learns, via overt feedback and correction mechanisms, the motor routines necessary for hitting sensory targets. Once learned, speech production involves activating these motor programs. If something goes wrong, the only way to catch it is via overt feedback. In other words, there is no internal forward prediction/correction mechanism.

There is a simple argument against this position: conduction aphasia. Conduction aphasics have nothing wrong with their auditory targets. They have normal speech perception and can readily detect errors in their own speech. They do not have a motor articulatory problem either. Much of their speech is fluent and accurate. However, they make phonemic errors more often than control subjects do. A natural explanation of this is that conduction aphasics have a damaged internal correction mechanism (Hickok et al. 2011). They can activate the learned motor programs, they can activate the auditory targets, but if something goes wrong in the motor programming, they can't generate an internal forward prediction and correct the error before it is spoken, thus their speech error rate goes up relative to individuals with an intact system. This is one aspect of the DIVA model that needs to be updated.

Perkell, J. (2010). Movement goals and feedback and feedforward control mechanisms in speech production Journal of Neurolinguistics DOI: 10.1016/j.jneuroling.2010.02.011

Hickok G, Houde J, & Rong F (2011). Sensorimotor integration in speech processing: computational basis and neural organization. Neuron, 69 (3), 407-22 PMID: 21315253

14 comments:

Frank Guenther said...

Greg,

I think a straightforward explanation for the conduction aphasia errors is that feedforward motor programs are not entirely impervious to occasional errors even in adults. Evidence for imperfections in feedforward motor programs includes the fact that speech output under loud masking noise is good but not perfect, and the speech of postlingually deafened individuals can be good but again is not perfect. In these cases, the speaker relies more heavily on feedforward control because auditory feedback control is unavailable, and this leads to occasional phonemic errors.

A neurologically normal individual will be constantly correcting for these small errors while they are still subphonemic. At any given instant these will generally be tiny corrections; i.e., very little feedback control is needed, as Perkell asserts in the target article, as do I in other publications. However in the conduction aphasic these small errors will accumulate due to disruption of auditory feedback control (from the conduction aphasia), occasionally leading to phonemic errors.

To summarize (in rough terms), the cause of occasional phonemic errors in conduction aphasics may be the same as in postlingually deaf individuals who presumably have intact internal forward model prediction circuitry: the feedforward motor programs for speech are not completely perfect and thus errors will occur without sensory feedback control. I thus don't see this as definitive evidence for the existence of forward model predictions that lead to internal corrections prior to speech output, at least at the speech motor control level.

In earlier versions of DIVA (e.g. Guenther et al. 1998) we explicitly used the sort of internal forward model prediction you are describing for speech motor control. Such a mechanism is probably best described as a type of feedforward control since it doesn't involve the use of sensory feedback for correction of ongoing speech. In more recent versions of DIVA we've omitted explicit description of this mechanism for a few reasons I outline below.

First, it significantly complicates description of the model and adds very little in terms of functionality (i.e., the model works pretty fine without it, instead using a more straightforward type of feedforward control). This has led to more people understanding and using the model to study speech.

Second, it cannot be the *only* form of feedforward control, since we know that damage to the sensory cortical areas (which are presumably the targets of these forward model predictions) does not dramatically disrupt speech motor output (as in sensory aphasia), which it would be expected to do if this sort of control was the main type of feedforward control for speech. Instead it seems that an intact premotor/motor cortex and associated subcortical circuitry is sufficient for generating feedforward motor programs for speech.

All that said, my jury is still out as to whether the type of internal prediction process you refer to is used for speech motor control, which I differentiate from phonological planning of speech.
(Note that we have forward model predictions in the form of auditory targets in the DIVA model, but these are part of the sensory feedback control subsystem as opposed to the role you are describing.) I know that Levelt and colleagues, as well as you and others, believe that this sort of prediction and internal correction occurs at the phonological level, but I don't know of convincing evidence that it occurs at the speech motor control level (i.e., as part of the sensorimotor interactions underlying syllable production that are treated in DIVA). This is a third reason for not describing it in the more recent DIVA model publications -- I am not sure it is fully supported by the data.

Cheers,
Frank

Guenther, F.H., Hampson, M., and Johnson, D. (1998). A theoretical investigation of reference frames for the planning of speech movements. Psychological Review, 105, pp. 611-633.

Frank Guenther said...

BTW, as a testament to your blog, three separate people mentioned this post to me today.

And I'd be curious to hear about the other things you think are wrong in DIVA :-).

Greg Hickok said...

Hi Frank,

Good to hear from you! As you know, I've been a long time fan of your work including DIVA. I think it is fair to say that you and your group are unparalleled in terms of the explicitness of your computational model and it's relation to the neural circuits that support it. There is lots that RIGHT about DIVA.

Things I don't like:

1. As noted, I think the feedforward control assumption is wrong for reasons noted. An internal feedback control circuit is needed.

2. I don't like the idea of a "speech sound" map living in left inferior frontal gyrus/PMC. Shouldn't a "sound" map be in auditory cortex? Maybe this is just a terminological objection as you suggest that this sound map is really more like Levelt's mental syllabary (a more motor concept). But if that is true then,

3. I don't like the idea of a motor act starting with the activation of a motor unit. We both agree that the targets are auditory. To me it makes more sense to first define the target or goal of an action, i.e., activate the sensory system as the starting point for an action. However,

4. I think for most words that are produced, activation of the production network involves parallel inputs to both motor (~mental syllabary) and auditory representations.

Finally,

5. I think that there is a hierarchy hidden in the system that DIVA (or my published ideas) don't deal with very well.

I'll respond to your suggestions re: conduction aphasia soon...

Greg Hickok said...

Thanks for your thoughts on conduction aphasia and DIVA. I'm happy to be having this discussion.

If I correctly understand what you are arguing, conduction aphasics make more speech errors than controls because small (sub-phonemic) feedforward errors (which we all make) are not corrected because auditory feedback control is disrupted. This leads to a build up of small sub-phonemic errors which eventually cross a category boundary and become a phonemic error. You suggest that functionally this is similar to the situation in post lingual deafness.

The main problem I have with this account is that the pattern of speech errors in conduction aphasia and post lingual deafness seems very different, with CA errors being very phonological and variable and PL deafness being more phonetic and consistent. (I'm no expert on PL deafness so please advise if I'm wrong.) This suggests a different mechanism for the two conditions.

Relatedly, if it was the case that phonemic errors were a result of a relatively slow accumulation of sub-phonemic errors, you would expect acute disruption to cause any problems. However, conduction aphasia is, if anything, more evident in acute stages of stroke. And direct stimulation of cortex in the vicinity of Spt produces an immediate effect on paraphasic errors. Here's a quote from Anderson et al. (1999 Brain and Language 70, 1–12): "During the electrical stimulation trials she demonstrated frequent and multiple phonemic paraphasic errors during verbal picture naming and the repetition of words and nonwords" (p. 7). So it's not an error accumulation problem.

Greg Hickok said...

I agree with you that sensory-based control is not the only source of feedforward control (as you put it). The motor system has to be able to control speech reasonably well without much input from the sensory system (in a mature, trained system of course). The evidence for this, as you point out, is that damage to sensory system does not halt speech production. This is why we've proposed both a motor and sensory phonological system that are activated in parallel via conceptual-semantic systems during speech production.

It makes sense to me that DIVA works just about as well without an internal feedback loop. For highly learned syllables, the "motor syllabary" is good enough to get the job done on it's own most of the time. Where you really need the internal feedback loop is for lower frequency or more complex phonological forms or for nonwords where the sensory system has to truly guide the motor phonological selection. I believe conduction aphasia clearly uncovers the existence of this internal network.

VilemKodytek said...

Elliott Ross (Neuroscientist 16, 2010, 222-243) mentions a patient with a Broca's area lesion who, after recovery, looked excellent:

"On formal examination of his language, I could not detect any aphasic deficits, including problems with spelling, writing, praxis, or comprehension of sentences with complex syntax." (Ross, p. 226).

The point is that the patient could do well an hour a day only:

“If he did any more than this, he began making substantial articulatory and graphemic errors with a decline in his ability to communicate that required increasing mental effort.” (ibid.)

It might’ve something to do with his overall fitness, but there still must’ve been something else. Maybe the recovery from stroke proceeds in a similar way as late literacy acquisition, which is supported by new both gray and white matter (Carreiras et al., Nature 461, 2009, 983-986.). What if, in chronic conduction aphasia, the new (or spare) matter, no matter whether gray or white, is simply not efficient enough, causing more errors in the feedforward process?
Vilem

Greg Hickok said...

Hi Vilem,
If the DIVA functional-anatomical assumptions are correct, the driving feedforward process is in the frontal lobe (IFG/vPM; the "speech sound map"). Lesions in conduction aphasia, however, are posterior temporal-parietal, overlapping functional auditory-motor area Spt as far as we can tell. So decreased efficiency in DIVA's feedforward process isn't going to cut it as an explanation because it's in the wrong part of the brain.

If we include the auditory system in the "feedforward" system, i.e., the system that drives motor selection/programming, then yes, I think this is where the problem is. It is the interface between the auditory targets and the stored motor programs that can hit those targets with vocal tract gestures.

VilemKodytek said...

Hi Greg,
By “feedback” I was referring to the first section of the green line in Fig. 4 in Hickok - Houde - Rong (2011), leading from “articulatory”control” to “motor phonological system” (and then on, but the continuation is OK). Since were it there in conduction aphasic patients, they wouldn’t probably start speaking at all.

I don’t know DIVA so I can’t comment on it. I only just have had a fast look at their 1998 paper. Anyway, I love quantitative models like this and my general opinion is that they need not be 100% correct to be useful.
Vilem

Greg Hickok said...

Ah, the token "efference copy" in Fig. 4. You know, if you read that paper carefully you'll notice that this really doesn't do anything in the model. It is more of a theoretical vestige from motor control models. In a forthcoming variant, that green line is gone...

In any case, lesion location doesn't match up with with damage to any part of the frontal motor system in my account or DIVA.

You should check out the DIVA model. It is very useful and goes well beyond anything I've said regarding the speech production system in terms of coverage and computational explicitness.

VilemKodytek said...

You're right, Greg. I confused it quite a bit.
Vilem

Frank Guenther said...

Greg,

To the degree that they are truly phonological, the conduction aphasia errors you describe are not within the purview of the DIVA model, which is a model of speech motor control, i.e. the execution of syllabic motor programs after they have been selected. CA phonological errors presumably are the result of higher-level phonological processes, more akin to the processes we address in our GODIVA model of speech sequencing (Bohland et al., 2010, J Cog Neurosci), though at this point we do not have an internal monitoring system in that model.

A minor point: when I say accumulation of small sub-phonemic errors can eventually lead to a phonemic error, I mean over milliseconds, not days (as you seem to imply in your response about acute effects of CA).

I'll respond to your other points about the DIVA model in subsequent posts.

Best,
Frank

Frank Guenther said...

Greg,

Regarding your second and third issues with the DIVA model (a speech sound map in left ventral premotor cortex and the idea of starting motor acts by activating motor cells), you appear to be overlooking the fact that there are "auditory" representations in premotor cortex (just as visual space representations exist in hand/arm areas of dorsal premotor cortex; see for example Alexander & Crutcher, 1990, J Neurophys). In fact we've shown in electrophysiologic recordings from the left ventral premotor cortex of a human patient suffering from locked-in syndrome that you can decode intended formant frequencies of speech from neural activity in this area (Guenther et al., 2009, PLoS ONE).

Given that there is an auditory representation in premotor cortex, I don't see why it would be necessary to activate sensory cortex prior to premotor cortex for producing spontaneous speech output. (Note that, if DIVA were performing a repetition task rather than "spontaneous" speech, auditory cortex would be active prior to premotor cortex since the incoming auditory signal would be driving the premotor cortex motor program selection.)

You could argue that we should call the speech sound map a "motor program map", but the reason we didn't use a term like "phoneme map" or "syllable map" is that the size of the motor programs can vary in this map. The most typical size is probably the syllable, but there are also phonemic motor programs and multi-syllabic motor programs (e.g., for your name or other commonly produced utterance).

Regarding your 4th issue (activation of production involves parallel inputs to both motor and auditory representations), this is exactly what happens in DIVA: Activation of the premotor cortex cells leads immediately to activation of corresponding cells in auditory and somatosensory cortex (referred to in the model as the auditory and somatosensory targets) as well as activation of cells in primary motor cortex.

Regarding your 5th issue (heirarchies hidden in the model), this is no doubt true of almost any useful neural model. In DIVA we don't worry about inverse dynamics, for example. If we did, the model would become way too complicated to be useful for other scientists. I have a feeling you are more concerned with higher levels of the heirarchy, related to phonology, etc. Our GODIVA model (Bohland et al., 2010, J Cog Neurosci) treats some of those issues. I'd be interested to hear your thoughts on that model some time.

Greg Hickok said...

Hi Frank,
Thanks for the clarifications and further details. I'm a little surprised to hear that DIVA is (i) intended to be a model involving "syllabic motor programs" (ii) where the targets are auditory (presumably also up to the syllabic level?), (iii) with auditory cortices being implicated, but (iv) at a lower level of phonological processing than that implicated in conduction aphasia (which implicates some of the same cortex DIVA lays claim to). This would seem to suggest that there is higher level of phonological processing that involves syllables and phonemes but is not involved in motor control and another lower level that involves syllables and phonemes but is a fundamentally a motor control system. Why the distinction? Is there is a principled reason? I guess I just don't see any compelling reason to separate motor control circuits from phonological selection circuits.

Error accumulation over milliseconds certainly makes more sense! I did misunderstand that point...

Greg Hickok said...

You suggest that it is a fact that there are "auditory" representations in premotor cortex. Why the quotes on "auditory"? Are they "auditory" or auditory? And if they are not auditory in the normal sensory sense, could they be "auditory" in the motor sense? I.e., motor programs that can hit auditory targets? I don't believe that there are auditory representations in motor cortex.

So given that there is not an auditory representation in motor cortex :-) you do need to activate a sensory target. This provides a natural explanation for why disruption the STG can lead to speech errors.

Re: parallel inputs. Isn't it a sequential thing in DIVA? Premotor activation is first and then drives sensory activation? And doesn't this lead to a problem? How do you know whether you've hit your target in sensory cortex if the thing that is defining the target is the signal coming from the motor system. It seems to me that you need an independent sensory target activation.

Fun discussion! Thanks much and keep it coming!