Monday, April 6, 2009

Speech: Not enough distinctions are being made

Not enough distinctions are being made. For better or for worse -- and probably for worse -- let me reiterate a few points that have been raised, because they point to the need for much greater 'conceptual hygiene.' I forget who keeps using the "not enough distinctions" phrase, sounds like the philosopher Jerry Fodor, but I think this point is critical in our current back and forth. Not enough distinctions are being made. Consequently, the discussion that is ongoing about speech is not sufficiently granular.

1. The 'moving parts' (or atoms, or lego blocks, or primitives, or whatever) at the basis of spoken language processing -- both from the input and output sides -- are of course more complex, and larger in number, than we ever discuss here, and consequently the discussion could get hijacked by underspecified concepts. And sometimes is ...

For example ... When we are discussing the so-called motor aspects -- which ones?? To pick up on yesterday's posts, in the Liberman revised motor theory, the objects of perception are intended articulatory gestures. As was rightly pointed out, how close to actual motor output are such objects? That is itself a topic of inquiry, and a complicated one at that. The motor system is not monolithic, and it matters a great deal whether we are working on neuronal populations that form the immediate substrate of motor output or populations that are richly connected to sensory areas but are distal to, say, M1 neurons. Incidentally, the literature on eye movements is worth looking to for some inspiration in this regard. More on that eventually.

Similarly, we should, I think, be very very careful about distinguishing forward models (that rely on a strong predictive element) from the motor generation of output. A forward model is associated with output -- but is not the same as the motor program that generates the output. And, crucially, a forward model does not have to be instantiated in motor cortex. That's an entirel different question as again was pointed out.

If we think of the phrase "motor" as referring to the neural circuitry that underlies output generation (i.e. the part of motor cortex that is required or speaking), then a motor theory is, I think, wrong, and if Luciano Fadiga (Hi Luciano -- thanks for participating!) is intending this view of a motor theory then it won't work.

2. Greg keeps harping on this, and let me also emphasize: what you use as a task matters a great deal. There is, from a perceptual, computational, and neurophysiological point of view a huge difference between, say, syllable discrimination in an experimental task setting, on the one hand, and comprehending spoken sentences, on the other. There are OBVIOUSLY some overlapping component processes, but can we please please please move on from this point? This issue has been rehearsed and discussed since the late 1990s ...

All else being equal, I am persuaded by an auditory view of speech perception, in which internal forward models (but not motor output models) play an important role (for example incorporating algorithms such as analysis-by-synthesis). I am happy to see and admit to a modulatory role of the type demonstrated by Luciano Fadiga and colleagues, but that activity is not epistemologically prior or causally necessary.


yisroel said...

Well put, if not somewhat terse. Perhaps you could elaborate a bit more on forward models?

Anonymous said...

Also, what precisely do you mean by internal forward models (but not motor output models) play an important role? What is a motor model exactly, and how does it differ from an internal forward model?

David Poeppel said...

I would like to draw a distinction between the following two aspects of the system:

Internal forward models (and I adopt the definition from the work of Kawato, Wolpert, and others in the motor control literature) PREDICT the sensory consequences of to-be-executed movements by way of "efference copies" of the (potential) motor commands. Notice that this requires translating the 'code' carried in the efference copy (a motor code) into a coordinate system that is suitable to compare with the information derived from the input system. For example, if the spoken language internal forward model is providing an efference copy to the auditory system (cf Frank Guenther's work), the comparison operations executed at the auditory system need to be in those coordinates. Or some other code -- but some common code that permits comparing the predicted output to the output that the perceptual system is registering.

I think this stage/subsystem is worth distinguishing from that part of the motor system responsible for generating the output to the musculature. There a continuous motor vector has to be generated from discrete action commands. It seems to me that those are two aspects of motor control -- including speech motor control -- that could be discussed separately.

I don't work on motor control, so I could be completely misguided about this. But if these two aspects of motor control are distinct then their role in speech perception could be distinct, no?

Finally, I think that the concept of internal forward model can be fruitfully adapted to other contexts in cognition, and especially language processing. In particular, in precisely those contexts in which precise, step-by-step predictions are being made (e.g. parsing), this concept holds promise.

Is this making any sense? Maybe not ... I'm tired, stressed out, and in a hurry ...

yisroel said...

Actually, this makes a lot of intuitive sense to me (I'm not well versed enough in the lit to say if it jives with current research), particularly your point about predictive processes.
A question to you on the language processing angle. Do you think that the efferent 'code' sent to match potential incoming information contains dissociable phonological and semantic information? That is, whereas we seem to able to break down the incoming speech we hear into separate phonological and semantic information, perhaps from a 'predictive' perspective we only expect words as a whole.
Now I can ask if I'm making sense?