Wednesday, March 22, 2017

Misunderstandings of the Hickok & Poeppel Dual Stream framework: Comments on Dial & Martin 2017

A recent paper by Dial & Martin (DM) presents some interesting data on the relation between performance on a range of different speech perception tasks including some that have been the topic of discussion on this blog and in many of my papers.  These include syllable discrimination and auditory comprehension among others.  I have argued in several papers with David Poeppel and others that these two tasks differentially engage the dorsal (syllable discrimination) and ventral streams (comprehension).  DM sought to test this claim by testing how well these tasks hang together or dissociate in a group of 13 aphasic patients.  Their primary claim is that performance on sublexical and comprehension tasks largely hang together in contrast to previous reports of double dissociations. They suggest the discrepancy is due to better controlled stimuli in their experiment compared to past studies. DM's experiments are really nicely done and generated some fantastic data.  I don't think their conclusions about the dual stream model follow, however, because they get the dual stream model wrong.

First a comment on their data, focusing on the syllable discrimination and word picture matching tasks  (their Experiment 2a) as these are the poster-child cases.  DM report a strong correlation between performance on these tasks.  It indeed looks quite strong.  But they also report that two patients (18%) performed significantly better on the auditory comprehension task compared to the discrimination task. The control group did the same: significantly better on comp than disc.  So this is consistent with claims that these tasks are tapping into partially shared, partially different processes, as Hickok &  Poeppel (HP) have claimed.  

Do these findings lead to a rejection of part of the HP dual stream framework claims?  DM say yes. Here's a couple quotes from their concluding remarks:
5.2. Concluding comments on implications for dual route models 
Though dual route models with a specific neuroanatomical basis like that of Hickok and Poeppel have been proposed relatively recently (Hickok and Poeppel, 2000), cognitive models of language processes with a dual route framework (though typically without a specified neural basis) are common in the neuropsychological literature, particularly for reading and repetition (e.g., Coltheart et al., 2001; Dell et al., 2007; Hanley et al., 2004; Hanley and Kay, 1997; Hillis and Caramazza, 1991; McCarthy and Warrington, 1984; Nozari et al., 2010). Critically, many of these models assume that sublexical processing is shared between the two routes and the routes do not become activated until after sublexical processing occurs. A similar approach could be applied in the speech perception domain. That is, one might assume that there are separable routes for translation to speech output and for accessing meaning, but assume that sublexical processing is shared by the two routes and must be accomplished before processing branches into the separate routes. 
... In summary, the current study provides support for models of speech perception where processing of sublexical information is a prerequisite for processing of lexical information, as is the case in TRACE (McClelland and Elman, 1986), NAM (Luce and Pisoni, 1998) and Shortlist/MERGE (Norris, 1994; Norris et al., 2000). On the other hand, we failed to find support for models that do not require passage through sublexical levels to reach lexical levels, such as the episodic theory of speech perception (e.g., Goldinger, 1998) or dual route models of speech perception (Hickok and Poeppel, 2000, 2004, 2007; Hickok, 2014; Poeppel and Hickok, 2004; Majerus, 2013; Scott and Wise, 2004; Wise et al., 2001).  [emphasis added]
The problem with these conclusions is that this characterization of the HP dual route framework is inaccurate.  We do not claim that the system does not require passage through sublexical levels.  Rather, we specifically propose a phonological (not lexical) level of processing/representation that is shared between the two routes, as is clear in our figure from HP 2007 (yellow shading).


This is not a new feature of the HP framework.  Our claim of a shared level of representation between dorsal and ventral streams goes back to our 2000 paper.  From the abstract:
In this review, we argue that cortical fields in the posterior–superior temporal lobe, bilaterally, constitute the primary substrate for constructing sound-based representations of speech, and that these sound-based representations interface with different supramodal systems in a task-dependent manner. [emphasis added]
To restate, we proposed one sound-based (not lexically based) speech network located in the STG region that interfaces with two systems in a task dependent manner.  This clearly predicts associations between tasks if the functional damage is in the shared region and dissociations if the functional damage is in one stream or the other.  Both patterns should be found and DM's study confirms this.

So where did the idea that HP propose that speech recognition/comprehension can skip the sublexical level?  They quote one of my papers with former student, Kenny Vaden, as support for this assumption:
For example, Vaden et al. (2011) state that: [sublexical] information is only represented on the motor side of speech processing and…[is] not explicitly extracted or represented as a part of spoken word recognition (p. 2672). 
But this is misleading especially when you look at the term that DM replaced with their bracketed [sublexical] term.  Here's the full quote from this paper:
Our findings are more in line with the view that segment level information is only represented explicitly on the motor side of speech processing and that segments are not explicitly extracted or represented as a part of spoken word recognition as some authors have proposed (Massaro, 1972).  -Vaden et al. 2011
Two things to note here.  One is that Vaden et al. are noting that the findings we reported were more in line with theories that did not specifically implicate segmental representations in speech recognition; we were not making a claim about the position of the dual stream model of HP.  Second and more importantly, there is a difference between sublexical and segmental.  Sublexical means things that are below the level of the word, which includes segments but also syllables or pieces of syllables. In recent years I have leaned more and more toward the view that segmental units are not represented on the perceptual/recognition side of speech processing as the Vaden et al. quote suggests.  (David's position is different, by the way, I think.)  But this view of mine does not imply that sublexical information isn't processed in the STG and shared between the two streams.  I believe it is!  And DM's findings are perfectly compatible with this view.

Moreover, the HP claim has little to do with the nature of the representation and more to do with the process.  Notice that we don't say that the dorsal stream is more involved in sublexical representations, we say that it is more involved in sublexical tasks.  It is about the task driven cognitive/metalinguistic/ecologically invalid processes that are invoked most strongly by sublexical tasks, what we called "explicit attention" in HP2000:
Tasks that regularly involve these extra-auditory left hemisphere structures [i.e., the dorsal stream] all seem to require explicit attention to segmental information.  Note that such tasks are fundamentally different from tasks that involve auditory comprehension: when one listens to an utterance in normal conversation, there is no conscious knowledge of the occurrence of specific phonemic segments, the content of the message is consciously retained. 
So, the reason why the dorsal stream gets involved in syllable discrimination is that it is a task that requires attentional mechanisms that aren't normally involved in normal speech recognition and the network over which these attentional mechanisms can best operate is the sensorimotor dorsal stream network.

The problem with tasks like syllable discrimination is NOT that they can't assess the integrity of the perceptual analysis/representational system in the STG/STS, it is that you can't tell whether deficits on that task are coming from perceptual problems or metalinguistic attentional (or working memory) problems.  It's interesting to see how various speechy tasks hang together or not--and stay tuned for my own foray into this area with evidence for both associations and dissociations consistent with HP--but honestly, if you want to unambiguously map the circuits and computations involved in speech recognition as it is used in the wild, dump syllable discrimination and stick to auditory comprehension.

No comments: