Their theoretical trick is to link up action control circuits for object-oriented actions and action control circuits for articulating words related to those actions. Motor programs for drinking are linked to motor programs for saying "drink". Then when you hear the word "drink" you activate the motor program for saying the word and this in turn activates the motor programs for actual drinking and this allows you to understand the word.
The overlap ... between the speech articulation and action control is meant to imply that the act of articulation primes the associated motor actions and that performing the actions primes the articulation. That is, we tend to do what we say, and we tend to say (or at least covertly verbalize) what we do. Furthermore, when listening to speech, bottom-up processing activates the speech controller (Fadiga et al., 2002; Galantucci et al., 2006; Guenther et al., 2006), which in turn activates the action controller, thereby grounding the meaning of the speech signal in action.
So as I reach for and drink from my coffee cup, what words will I covertly verbalize? Drink, consume, enjoy, hydrate, caffeinate? Fixate, look at, gaze towards, reach, extend, open, close, grasp, grab, envelope, grip, hold, lift, elevate, bring-towards, draw-near, transport, purse (the lips), tip, tilt, turn, rotate, supinate, sip, slurp, sniff, taste, swallow, draw-away, place, put, set, release, let go? No wonder I can't chat with someone while drinking coffee. My motor speech system is REALLY busy!
By the way, what might the action controller for the action drink code? It can't be a specific movement because it has to generalize across drinking from mugs, wine glasses, lidded cups, espresso cups, straws, water bottles with and without sport lids, drinking by leaning down to the container or by lifting it up, drinking from a sink faucet, drinking from a water fountain, drinking morning dew adhering to leaves, drinking rain by opening your mouth to the sky, drinking by asking someone else to pour water into your mouth. And if you walked outside right now, opened your mouth to a cloudless sky and then swallowed, would you being drinking? Why not? If the meaning of drink is grounded in actions, why should it matter whether it is raining or not?
Because it's not the movements themselves that define the meaning.
But the motor system can generate predictions about the consequences of an action and that is where the meaning comes from, you might argue, as do Glenberg and Gallese:
part of the knowledge of what “drink” means consists of expected consequences of drinking
And what are those consequences? Glenberg and Gallese get it (mostly) right:
...predictions are driven by activity in the motor system (cf. Fiebach and Schubotz, 2006), however, the predictions themselves reside in activity across the brain. For example, predictions of how the body will change on the basis of action result from activity in somatosensory cortices, predictions of changes in spatial layout result from activity in visual and parietal cortices, and predictions of what will be heard result from activity in temporal areas.
So where do we stand? Meanings are dependent on consequences and consequences "reside in activity across the brain" (i.e., sensory areas). Therefore, the meanings of actions are not coded in the motor system. All the motor system does according to Glenberg and Gallese (if you read between the lines) is generate predictions. In other words, the motor system is nothing more than a way of accessing the meanings (stored elsewhere) via associations.
So just to spell it out for the readers at home. Here is their model of language comprehension:
hear a word --> activate motor program for saying word --> activate motor program for actions related to word --> generate predicted consequences of the action in sensory systems --> understanding.
Why not just go from the word to the sensory system directly? Is the brain not capable of forming such associations? In other words, if all the motor system is doing is providing an associative link, why can't you get there via non-motor associative links.
More to the point: if the *particular* actions don't matter, as even the mirror neuron crowd now acknowledges, and if what matters is the higher level goals or consequences, and if these goals or consequences are coded in sensory systems (which they are), then there is little role for the motor system in conceptual knowledge of actions.
Glenberg and Gallese correctly point out a strong empirical prediction of their model:
The ABL theory makes a novel and strong prediction: adapting an action controller will produce an effect on language comprehension
They cite Bak's work on ALS and some use-induced plasticity effects. Again, let me suggest, quite unscientifically, that Stephen Hawking would have a hard time functioning if he didn't understand verbs. Further, use-induced plasticity is known to modulate response bias -- a likely source of these effects. In short, the evidence for the strong prediction is weak at best.
But rather than adapting an action controller, let's remove it as a means to test their prediction head on. Given their model in which perceived words activate motor programs for articulating those words, which activate motor programs for generating actions, which generate predictions etc., if you don't have the motor programs for articulating words you shouldn't be able to comprehend speech, or at least show some impairment. Yet there is an abundance of evidence that language comprehension is not dependent on the motor system. I reviewed much of it in my "Mirror Neuron Forum" contribution that Glenberg edited and Gallese contributed to. NONE OF THIS WORK IS EVEN MENTIONED in Glenberg and Gallese's piece. This is rather unscholarly in my opinion.
Toward the end of the paper they include a section on non-motor processes. In it they write,
We have focused on motor processes for two related reasons. First, we believe that the basic function of cognition is control of action. From an evolutionary perspective, it is hard to imagine any other story. That is, systems evolve because they contribute to the ability to survive and reproduce, and those activities demand action. As Rudolfo Llinas puts it, “The nervous system is only necessary for multicellular creatures-that can orchestrate and express active movement” Thus, although brains have impressive capacities for perception, emotion, and more, those capacities are in the service of action
I agree. But action for action sake is useless. The reason WHY brains have impressive capacities for perception, emotion, and more is to give action purpose, meaning. Without these non-motor systems, the action system is literally and figuratively blind and therefore completely useless.
Why the unhealthy obsession with the motor system and complete disregard for the mountain of evidence against their ideas. Because the starting point for all the theoretical fumbling is a single assumption that has gained the status of an axiom in the minds of researchers like Glenberg and Gallese: that cognition revolves around embodiment with mirror neurons/the motor system at the core. (Glenberg's lab name even assumes his hypothesis, "Laboratory for Embodied Cognition"). Once you commit to an idea you have no choice to build a convoluted story to uphold your assumption and ignore contradictory evidence.
I don't think there is a ghost of a chance that Glenberg and Gallese will ever change their views in light of empirical fact. Skinner, for example, was a diehard defender of behaviorism long after people like Chomsky, Miller, Broadbent and others clearly demonstrated that the approach was theoretically bankrupt. Today the cognitive approach to explaining behavior dominates both psychology and neuroscience, including embodied approaches like Glenberg and Gallese's. My hope is that by pointing out the inadequacies of proposals like these, the next generation of scientists, who aren't saddled with tired assumptions, will ultimately move the field forward and consider the function of mirror neurons and the motor system in a more balanced light.
Hickok, G. (2012). Computational neuroanatomy of speech production. Nature Reviews Neuroscience, 13, 135-145.