Tuesday, October 13, 2015

The Embodied Cognition Challenge

Typical embodied cognition experiments ask whether low-level sensory or motor information affects performance on this task or that.  The journals are filled with these kinds of experiments.  Some of these effects might even be real.  Assuming some of these effects are indeed real, let's now move on to the next questions: How much of the variance in performance does embodied cognition explain? And can embodied models improve on standard models?

I've pointed out previously that embodied effects are small at best. Here's an example--a statistically significant crossover interaction--from a rather high-profile TMS study that investigated the role of motor cortex in the recognition of lip- versus hand-related movements during stimulation of lip versus hand motor areas:


Effect size = ~1-2%  This is typical of these sorts of studies and beg for a theory of the remaining 98-99% of the variance.

A Challenge

So, let me throw out a challenge to the embodied cognition crowd in the context of well worked out non-embodied models of speech production.  Let's take a common set of data, build our embodied and non-embodied computational models and see how much of the data is accounted for by the standard versus the embodied model (or more likely, the embodied component of a more standard model).

Here is a database that contains naming data from a large sample of aphasic individuals.  The aim is to build a model that accounts for the distribution of naming errors.

Here is a standard, non-embodied model that we have called SLAM for Semantic-Lexical-Auditory-Motor.  (No, the "auditory-motor" part isn't embodied in the sense implied by embodied theorists, i.e., the level of representation in this part of the network is phonological and abstract.)  Here's a picture of the structure of the model:


This model accounts for about 98% of the variance in patient naming error-type distributions.  Here is an example fit for a single patient (figure from Walker & Hickok, in press, PB&R), which shows the percent response for various categories of response types (correct, semantic error, formal error etc) for the patient (dotted line) and the model (solid line):


Incidentally, Matt Goldrick argued in a forthcoming reply to the SLAM model paper that this fit represents a complete model failure due to the fact that the patient had zero semantic errors whereas the model predicted some. This is an interesting claim that we had to take seriously and evaluate quantitatively, which we did.  But I digress.

The point is that if you believe that embodied cognition is the new paradigm, you need to start comparing embodied models to non-embodied models to test your claim.  Here we have an ideal testing ground: established models that use abstract linguistic representations to account for a large dataset.

My challenge: build an embodied model that beats SLAM.  You've got about 2% room for improvement.



4 comments:

Dan Mirman said...

As a practical matter, that database site is currently down for repairs. We hope to have it back up and running soon.

Matt Goldrick said...

When considering these data, I think it's important that any research carefully consider whether the variance in the dataset should be attributed to one's lexical access model. I think it's highly likely that these data include individuals with deficits to processes that are "prior" to lexical access (lexical semantics, conceptual processing) as well as individuals with deficits that are "subsequent" to lexical access (articulatory planning and execution). Given that any model is necessarily limited in scope, it's important that the model only receive credit (or demerits) for data that are relevant to the model.

I also believe that not all deviations are created equal; if a model says some error type should never occur, and it occurs, that seems to me to be a serious problem. The predicted occurrence of semantic errors in patients that never produce them, as well as the predicted occurrence of phonological errors patients that only produce semantic errors, are not simply "rounding errors" that represent inaccuracies in the details of the simulations. They represent a qualitative failure to match patient error patterns.

These points are discussed in a bit more detail in my comment on the SLAM proposal (available here; Greg I'm not sure if your reply is up?)

Greg Hickok said...

Fair enough. The point remains: we have data and we have a quantitative model. If one believes that there is a better account of the data, then let's be explicit and compare the models quantitatively.

moaxbrain said...

Hi Greg,

Thanks for the interesting post.

Independent of this model or other, your about potential scientific advancement offered by models that account for 1-2% of variance **within any dependent measure** is something that should be addressed across the board. This is not an issue of ‘whatabout’ery, but just consider the case of representational similarity analysis (RSA) in fMRI. From talks I’ve seen / paper I've reviewed (and this is often not reported), on the single participant level, these correlations often account for less than 1% of the variance in the BOLD signal’s spatial distribution (these turn out significant of course at group level). What do we take from that? And the same concern holds for functional connectivity analyses were minuscule effects (Pearsons’ R’s of 0.1 or less) are significant when propagated to group level.

Here’s one possibility – that effect sizes are not the be-all and end-all determinants of importance. A now-classic (was quite new when I read it!) paper by Prentice & Miller (http://faculty-gsb.stanford.edu/millerd/docs/1992psycbull.htm) talked about the utility in showing that even a bit of variance is affected by small manipulations, or in showing that one can at all impact DVs that are difficult to move around. (what they called “statistical versus methodological routes to an impressive demonstration”). Perhaps hit rate on such a task as you mention could be considered a DV that’s difficult to push around? I realize this doesn't address the issue of Aphasia and the specific model, but you lead with issue of effect sizes which caught my attention.

Best

Uri.