This is standard embodied cognition speak. I haven't read his book, but this view seems to be the central topic of Bergen's monograph, Louder than words: the new science of how the brain makes meaning. I'm sure his book is much more careful and articulated than the interview, but the interview is what more people will hear and so deserves a response, particularly because the interview discussion goes beyond word meanings, claiming to reset our understanding of language itself:
Just a few decades ago, many linguists thought the human brain had evolved special module for language. It seemed plausible that our brains have some unique structure or system. After all, no animal can use language the way people can.
But in the 1990s, scientists began testing the language-module theory using "functional" MRI technology that let them watch the brain respond to words. And what they saw didn't look like a module, says Benjamin Bergen.
They found something totally surprising," Bergen says. "It's not just certain specific little regions in the brain, regions dedicated to language, that were lighting up. It was kind of a whole-brain type of process.
He then goes to explain how we understand language via simulation, as in the baseball example. This generalization to language is troubling, reckless even. There are so many problems with the claim, it's hard to know where to start, but I'll try:
1. A theory of word meaning is not a theory of language, it's a theory of word meaning. Let's translate the argument to the visual domain to reveal how ridiculous this generalization is. "Just a few decades ago, many visual scientists used to think that the human brain has evolved special modules for vision, like systems for wavelength frequency detection, motion detection, and analysis of object form. But in the 1990s MRI technology let them watch the brain respond to visual scenes. And what they saw didn't look like a module, but involved activation all over the brain." Do we conclude that decades of research on vision got it all wrong just because lot's of brain tissues lights up when we look at things? Of course not! Bergen's comment is nothing more than a misguided interpretation of functional MRI and its relation to computational systems in the brain.
2. To push the point, it's not even clear to me that Bergen's theory has anything to do with language. It is a theory of conceptual representation, not a theory of how the brain takes an acoustic signal and extracts and transforms the relevant bits to make contact with that conceptual system. The latter issue is what occupies most linguists' time and theoretical focus. Does Bergen claim that his theory explains cochlear filtering of the acoustic signal. No. Does he claim that his theory explains how that signal is elaborated in the frequency and time domains to yield a spectro-temporal representation of the signal? No. Does he claim that the theory explains how that spectro-temporal signal makes contact with representations of word forms in the listener? No. Does visual simulation of the events described in the sentence explain the word order in the sentence? Or the position and use of words like the and to in that sentence? No, those are the kinds of things that perceptual scientists and linguists worry about: the transformation of the acoustic signal into some format that will allow contact with meaning and Bergen's simulation theory has nothing to say about it, which means that it has nothing to say about the "module for language" that many linguists used to believe in. Moral: don't claim to have solved puzzle A when you are fiddling with the pieces of puzzle B.
3. Simulation theories of conceptual representation don't solve any problems. Let's consider Bergen's theory: we understand the sentence "the shortstop threw the ball to first base" by simulating what it would be like to see the action and by simulating what it would be like to do the action. And, he argues elsewhere, we understand things we have never seen or done by combining or generalizing from things we have seen and done. So "flying pig" is understood by combining the experienced concepts of FLY (as seen with birds) with that of PIG. The result is the visual activation of the image of a pig with wings, which is the neural basis of our understanding. But wait, Bergen said that the way we understand action (flying is an action) is by simulating it in our visual system and by doing it with our motor system. It's not clear how we can simulate FLYING PIG in our motor system, so the motor part must not be critical in this case, which makes us question whether it is critical in the shortstop throwing a ball case. (Good thing we have a reason to question the motor part, because then we have an explanation for why quadraplegic individuals can enjoy baseball as much as the rest of us.) So, we must conclude, simulation of the perceptual bit is where our understanding of "flying pig" comes from. But now I'm confused. How do we know which perceptual experience to simulate? Do I combine my experience with pigs and birds and give the hybrid creature wings? Or do I combine pig with superman and give it a cape (a possibility noted in the interview)? Or maybe I combine pig with my experience flying on 737s and imagine a pig sitting in coach ordering a Diet Coke. Or should I combine pig with my baseball experiences and picture a mini pig being used as a baseball and getting smacked out to center field. Maybe, an embodied theorist might claim, that's the cool part: depending on which experiences I combine, I get a different meaning. Fine, but let's flip it around. How do I know that a pig with wings, a pig in coach, and a ball-shaped pig, one flapping, one sitting and sipping, and one hurtling through space are all examples of flying pigs? What is telling me that each of those simulations are linked? You might say that they are linked due to similarity of experience. By what metric? My perceptual experiences with each of these kinds of FLYING are wildly different. How does the brain know to associate them? Something must be telling the system that those instances are "similar" in the relevant respects. Now we need a theory of that! Here's the point: simulating a specific experience of say FLY can't be enough because it doesn't capture our ability to generalize the meaning to birds, planes, and baseballs. We have to be "simulating" something more abstract such that it captures those generalizations; and if we are "simulating" an abstract something, we might as well call it an abstract representation just like in "classical" theories. Saying that we understand by "simulation" just relabels the problem, it doesn't solve anything.
I'm sure we could go on but I think I'll just conclude instead: Bergen's theory is not about language so whatever claims that are made on that front are just hyperbole. And in the domain that the theory actually applies, it doesn't improve our understanding.