"Integration" is a major operation in language processing (and other domains). We have to integrate bits of sounds to extract words, integrate morphemic bits to derive word meanings, integrate lexical-semantic information with syntactic information, sensory with motor information, audio with visual information, and all of this with the contextual background.
Some theorists talk specifically about regions of the brain that perform such integrations. I've got my favorite sensory-motor integration site, Hagoort has a theory about phonological, semantic, and syntactic integration in (different portions of) Broca's area, more broadly, Damasio has been talking about "convergence zones" (aka, integration sites) for years.
Two thoughts come to mind. One, is there any part of the brain that isn't doing integration, i.e., how useful is the concept? And two, if the concept does have some value, how do we identify integration areas?
I don't know the answer to first question and I have some concerns about the way some in the field approach the second. W.r.t. the latter, a typical approach is to look for regions that increase in activity as a function of "integration load". The idea is that by making integration harder, we will drive integration areas more strongly and this will cause them to pop out in our brain scans. This seems logical enough. But is it true?
Suppose Broca's area -- the region that always seems to get involved when the going gets tough -- activates more in an audiovisual speech condition in which the audio and visual signals mismatch compared to when they match (an actual result). Let's consider the possible interpretations.
1. Broca's area does AV integration. It is less active when integration is easy, i.e., when A and V match than when integration is hard, i.e., when they mismatch because it has to work harder to integrate mismatched signals.
2. Broca's area doesn't do AV integration. It is less active when integration is actually happening, i.e., when A and V match, reflecting its non-involvment, than when integration isn't working, i.e., when there is an AV mismatch. Of course, this explanation requires an alternative explanation for why Broca's activates more for mismatch situations. There are plenty of possibilities: ambiguity resolution, response selection, error detection or just a WTF response (given the response properties of Broca's area I sometimes wonder if we should re-label it as area WTF).
Both possibilities seem perfectly consistent with the facts. Similar possibilities exist for other forms of integration making me question whether the "load" logic is really telling us what we think it is telling us.
There is another approach to identifying integration zones, namely to look for areas that respond to both types of information independently but respond better when they appear together. In our example, AV integration zones would be those areas that respond to auditory speech or visual speech, but respond best to AV speech. I tend to like this approach a bit better.
What are your thoughts?
In general I'm also more partial to the latter approach you mention. I think this is particularly true where the manipulation is dichotomous (e.g., match vs. mismatch) instead of parametric, because there seems to be even more room for alternate interpretations.
But I think there's also a more general question about these approaches to integration. It's often assumed that multimodal regions show superadditive responses; i.e., as you say, stronger responses to audiovisual speech compared to either modality alone. But in some sense, this seems counterintuitive to me, at least in the 'normal' situation where the information is congruent. It is often the case that loss of information (or greater ambiguity) is associated with increased neural response...this is true for phonological and semantic ambiguities, as well as cases of perceptual degradation. So that might suggest that in the case of congruent AV input, where the most information is available, less processing would be required—because it's fairly easy to process the speech.
Although the complete mismatch conditions seem pretty unnatural (and perhaps driving more of a WTF response), being able to parametrically vary the amount of information from each modality might be a way to get at this. There were some nice posters using this approach at NLC last year from a couple of different groups, so maybe we'll get some more data soon.
I think it’s really about time (sic) that the time domain is more taken into account hen hunting “integration”. Blobs without any time information will surely bring us closer to the grandmother neuron, so this is good news—if you believe in the grandmother neuron, which probably nobody does anymore. Integration—of information, of modalities, etc.—happens over time, and the synching and de-synching of brain regions would deserve a whole lot more attention from the language folks, I would say.
I am just reiterating what many smart colleagues have spelled out for us much more precisely before. But hunting areas that integrate rather than mechanisms of how areas together integrate appears increasingly futile to me.
I am of course all for taking into account timing information and trying to get at mechanisms of integration and not just regions, although surely at the end of the day we want to know both. I think we can still place useful anatomical constraints on theories of speech comprehension without getting to grandmother cells. ;-)
In any case, in thinking about M/EEG data, for example, what might be the measures of 'integrative' responses? Might some of these be similar to fMRI, with multimodal > unimodal responses in terms of power or peak signal? Synchronization/coupling? Latency of response? All of these? I am also curious as to what direction you predict these would go with increasing integration "load".
Although timing information will undoubtedly be crucial, I think that the main points raised by Greg are sort of methodology-independent...i.e., what experimental paradigms can we use to best test for integration, and how do we measure it?
I think that it is a very difficult problem to solve. Briefly, I am dubious of methods that use anything that could be strongly correlated with load or task difficulty to draw its conclusions. "Harder" integration means harder task, and it could simply be that our WTF area response has to do with working memory load (or executive function demands) and nothing to do with A/V integration per se. That is difficult to disambiguate.
I am with you on the last paragraph, which indicates that an area that activates to audio and video, but best under both, is a good starting point, but no more than that. A quick example is in V1, though both dimensions are visual. V1 responds well to oriented lines, and well to colors, and best to both (on an fMRI scale). However, this does not differentiate between the possibility that the neurons are actually performing integration and the possibility that there are two separate computations being performed and that the increased response to presenting both is just both populations of cells responding (people still argue about this on the level of "blobs" doing color and "interblobs" doing orientation, which is all based on staining methods and inaccessible to fMRI). That last part is the big problem with identifying integration areas--it's definitely a possibility that a potential A/V integration site is really just performing the same or a similar type of computation on both auditory and visual information, even if they aren't integrated.
Post a Comment