Friday, March 31, 2017

Guest blog post from Dial & Martin on Dual Stream models

The following is a guest post from Heather Dial and Randi Martin. I (Greg) have provided my comments on their post interspersed in italics and set off by "++++" symbols. 
++++++++++++++++++++++++++++++++++++++++++++++++++

Greg Hickok has noted several positive aspects of our recent paper but also claims that we have misunderstood the dual stream model and that our findings are, in fact, consistent with this model. We wish to respond to his arguments focusing on two main points: claims of the dual stream model and implications of our data.

Regarding the dual stream model
In this blog post, Hickok argues that we have misunderstood the dual stream model claims regarding sublexical processing. However, we argue that there is a lack of clarity in Hickok and Poeppel’s claims regarding the dual stream model, with statements across different articles seeming to imply different functions that are shared prior to divergence into the dorsal and ventral routes. We will highlight this point with a series of examples. In the 2007 paper, on p. 394, Hickok and Poeppel note that “there is overlap...in the computational operations leading up to and including the generation of sublexical representations.” If this were the entirety of the claims made by Hickok and Poeppel, then we agree that our findings are perfectly compatible with the dual stream model. However, in referencing the 2000 paper, it is not clear that the claim has always been that sublexical processing is shared. In fact, even the quote he takes from the abstract of the 2000 paper does not argue for sublexical processing as a requirement, but rather for shared auditory processing. In the 2000 paper (p. 131), Hickok and Poeppel expand by stating that:

“auditory related cortical fields in the posterior half of the superior temporal lobe, bilaterally, constitute the primary substrate for constructing sound-based representations of speech. From this point, however, we will argue that there are at least two distinct pathways that participate in speech perception in a task-dependent manner”

In our paper, we emphasize that sublexical processing refers to the processing of abstract, speech-specific representations. The “sound-based representations” that Hickok is referring to in the abstract for the 2000 paper are not necessarily speech-specific.

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: I agree but we were vague on purpose 17 years ago because we simply didn't know what the relation was between brain areas and acoustic/linguistic levels of representation.  This lack of knowledge is clearly indicated (but kind of buried) in the "Outstanding Questions" box in our 2000 article, where we write, "We have proposed that superior temporal lobe structures play an important role in constructing ‘sound-based representations of speech.' This process is complex, probably involving multiple levels of representation. How does this general notion of sound-based representations map onto the different linguistic levels of representation (e.g. phonetic features, syllabic structure, etc.)? Are there neuroanatomical subdivisions within auditory cortex that correspond to these levels of representation?" 

We deal with this issue in more depth in Hickok & Poeppel 2004 as well as the issue of speech-specificity.  Regarding the latter, we have always taken a rather agnostic position, arguing that specificity is for the most part an independent question in mapping the neurocomputational steps from sound to meaning or sound to articulation.  In short, we (or at least I) do not subscribe to the view that specificity is a prerequisite for identifying a linguistic level of processing, which undermines DM's point. Here is the relevant paragraph from our 2004 paper (p. 69):

++++++++++++++++++++++++++++++++++++++++++++++++++

 Moving past the 2000 paper to more recent instantiations of the dual stream model, it remains unclear what processing is shared. The figure that is placed in the blog post comes from the 2007 paper, and Hickok refers to the phonological processing portion (shaded yellow) as evidence that the dorsal and ventral streams share a sublexical processing level. However, on p. 398, Hickok and Poeppel note in a discussion of the superior temporal sulcus, the proposed site of the phonological network, that “STS activation can be modulated by the manipulation of psycholinguistic variables that tap phonological networks, such as phonological neighborhood density.” We would note that phonological neighborhood density is considered a lexical-level variable (e.g., Luce & Pisoni, 1998; Vitevitch & Luce, 1998; Vitevitch & Luce, 1999; Vitevitch, Luce, Charles-Luce & Kemmerer, 1997), which makes it sound like processing in this area is lexical rather than sublexical. 

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: These are good points and I both appreciate DM's frustration with our lack of clarity regarding the level of processing we are talking about and laud their interest in being more precise.  However, as before, we are being purposefully vague because (i) we weren't confident given available evidence that we could nail down a particular level of representation to the STS, (ii) we/I aren't convinced that linguistic levels of representation will map neatly only individual brain regions, (iii) we/I don't believe that a *region* is going to linguistic or level specific (although embedded networks may be). For me, the observation that the STS is modulated by factors like phonological neighborhood density tells me that something in the representational vicinity of phonology is happening in the STS.  And while I appreciate arguments that density effects are thought to reflect lexical processing, I'm not willing to use that conclusion to anchor my interpretation of what's happening in the STS for the reasons listed above. 
++++++++++++++++++++++++++++++++++++++++++++++++++

Yet another perspective seems to come from the Vaden, Piquado, & Hickok (2011) paper, that Hickok references in the blog.  Vaden et al. state on p. 2665, “The current study aimed to functionally identify sublexical phonological activity during spoken word recognition.”  This was done by examining the brain regions sensitive to phonotactic frequency in spoken word recognition. They found that this manipulation modulated activity in Broca’s area and not in superior temporal lobe regions. They conclude on p. 2672, “This finding… is more consistent with speech perception models in which segmental information is not explicitly accessed during word recognition.” We had interpreted this perhaps too broadly as implying that sublexical processing in general is not explicitly accessed, though as Hickok notes, it is possible that sublexical units other than phonemes might necessarily be involved (e.g., syllables) in word recognition.  However, such a possibility appears yet to be tested in a fashion to distinguish segmental and syllable-level representations.  It should be emphasized, however, that other researchers have found that activation in the superior temporal lobe does respond to sublexical manipulations (see Dial & Martin, p. 205).

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: I agree that there is not anything near consensus on whether or not segments are represented in the superior temporal lobe.  I don't think they are, but this is still a very open question. Admittedly, we were not clear in Vaden, et al. what level of representation is being processed in the STS. Although we indicated that our results are in line with proposals by Massaro, who argues for demi-syllables (CV/VC) as the unit of perceptual analysis, we did not come out and say that we believe this is the unit of analysis in the superior temporal lobe.
++++++++++++++++++++++++++++++++++++++++++++++++++

We wish to reiterate that if the dual route model’s claim is that the phonological network instantiates a sublexical processing level, then our findings are wholly compatible with this claim. If the current debate serves to clarify the issue of what phonological processes are argued to be shared prior to divergence into the two streams, then that will be a step forward.

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: Indeed, as I've tried to show, the claim of the dual route model is and always has been that the phonological network in the superior temporal lobe does involve sublexical levels of processing. I absolutely agree that this mini debate has served to clarify this point and moved us forward.
++++++++++++++++++++++++++++++++++++++++++++++++++

Regarding our data and the use of syllable discrimination to tap sublexical processing
In addition to the theoretical issue of shared sublexical processing, our paper raised an important methodological issue regarding whether syllable discrimination predicts lexical processing, as might be expected if lexical processing depends on sublexical processing (particularly so if syllables serve as the unit of sublexical phonological coding). Hickok and Poeppel have largely discredited the use of this task in assessing speech perception. For example, Hickok and Poeppel (2004, p.74) state:
“Sub-lexical tasks (syllable discrimination/identification) presumably represent an attempt to isolate and study the early stages in this normal comprehension process, that is, the acoustic– phonetic analysis and/or the sub-lexical processing stage. The paradox, of course, stems from the fact that patients exist who cannot accurately perform syllable discrimination/identification tasks, yet have normal word comprehension: if sub-lexical tasks isolate and measure early stages of the word comprehension process, deficits on sub-lexical tasks should be highly predictive of auditory comprehension deficits, yet they are not. What we suggest is that performance on sub-lexical tasks involves neural circuits beyond (i.e. a superset of) those involved in the normal comprehension process. This is an important observation because there are many studies of the functional anatomy of ‘speech perception’ that utilize sub-lexical tasks. Because sub-lexical tasks recruit neural circuits beyond those involved in word comprehension, the outcome of such studies may paint a misleading picture of the neural organization of speech perception, as it is used under more normal listening conditions.” [emphasis added]

And, Hickok and Poeppel (2007, p. 394) note that:

“the use of sublexical tasks would seem to be a logical choice for assessing these sublexical processes, except for the empirical observation that speech perception and speech recognition doubly dissociate.”

However, we would argue that tasks like syllable discrimination are valid assessments of sublexical processing that can be highly predictive of lexical processing. The behavioral double dissociations that Hickok and Poeppel refer to have most often been derived from studies in which the perceptual discriminations required in the sublexical tasks were much more difficult than those in the lexical tasks (e.g., single distinctive features in the sublexical discrimination task, and no phonological overlap with distractors in a picture-word matching task, as in the WAB word recognition subtest).

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: "most often" is a fair statement but one that ignores the fact that not all of the studies showing this dissociation were unmatched.  In our 2004 paper we leaned heavily on one study in particular (Miceli, et al, 1980) that used a picture-word matching task with both semantic and phonemic distractors. It is because this study used a phonemically better matched comprehension test that we reproduced the data in Table 1 in Hickok & Poeppel 2004 to make the point.  Although we didn't cite it in our review papers, I would also point you to Bishop et al. who found task effects in closely matched discrimination versus lexical status tasks.  See this blog post
++++++++++++++++++++++++++++++++++++++++++++++++++

When sublexical and lexical tasks are matched in the required discriminations, then performance on the two is highly related (e.g., our correlation between syllable discrimination and word discrimination for matched stimuli was .96).

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: Here's the key point that you are missing.  It's not so much about the stimuli (word vs. syllable), it's about the *task* (discrimination). It does not surprise me at all that these two tasks are highly correlated: they are the same task. In anticipation of a rebuttal I'll note that it is possible to perform this task over different representations, phonological versus semantic, but that doesn't mean (i) that patients actually do it that way (they may be doing both phonologically) or (ii) that there isn't still some shared process like cognitive control or working memory that is driving the correlation.  
++++++++++++++++++++++++++++++++++++++++++++++++++

 We did find, however, as pointed out in this blog post, that even though performance on picture-word matching and syllable discrimination were highly correlated (r=.86),

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: Still highly correlated which would argue against my point above, but a closer look at the data reveals a different picture. Here's a plot of correlation between "consonant discrimination" and "single PWM phonological foils":



  An outlier is apparent, which happens to be the only Wernicke's patient in the sample, i.e., the case where we would expect the most severe auditory comprehension deficit. Furthermore, this case had bilateral lesions involving the superior temporal lobe!  According to the dual stream model, we would expect a significant generalized speech perception deficit. It is no surprise at all, then, that this case was poor on both syllable discrimination and single word auditory comprehension.  If we remove this case from the analysis, the correlation between discrimination and comprehension disappears (r = 0.349, p = 0.266; BF = 0.62):



DM's data, therefore, provides further evidence for the dissociability of discrimination and comprehension tasks in contrast to the claim.  Continuing on...
++++++++++++++++++++++++++++++++++++++++++++++++++

 two patients performed significantly better on our picture to word matching (PWM) task than on our syllable discrimination task and that our control group performed better on the PWM task than the syllable discrimination task. Hickok argues that this is consistent with claims regarding the tasks tapping partially shared and partially different processes. On this point, we do not disagree. Our findings that controls performed better on the PWM task than the syllable discrimination suggest that the PWM task is easier than the syllable discrimination task. In other words, the PWM and syllable discrimination tasks were not appropriately matched. We believe that one important difference lies in the fact that the PWM task allows the participant to generate an internal phonological code for the picture which they can then compare to the auditory input. We thus created the auditory-written syllable matching (AWSM) task that allows for this same internal generation of a phonological code, thus matching task demands between the sublexical and lexical processing tasks.

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: We have to ask why comprehension tasks are easier. I would argue they are easier because they involve the natural task processes involved in normal everyday speech processing in the wild.  Discrimination is not a task we ever perform except in laboratories.  I argue that it involves cognitive operations that are not normally used in normal speech processing, hence it is harder.  True, they aren't matched, but that's the point! You suggest that pictures allow the generation of an internal phonological code that can be compared to the auditory input. That actually seems harder in some ways than having that code given to you overtly in a discrimination task.  How does the subject know which internal phonological code to generate from the array of pictures (e.g., Miceli et al. used 6 pictures)? But more specifically, you suggest that PWM is easier because you don't have to maintain two items in memory for comparison. I agree! That has been my argument all along. You have to bring to bear additional processes beyond those normally used in speech recognition in order to perform the syllable discrimination task *because of the task*.
++++++++++++++++++++++++++++++++++++++++++++++++++

 In addition, AWSM and PWM require maintenance of a single auditory percept to compare to a single picture or written syllable, whereas in the syllable discrimination task two items must be maintained. The AWSM was indeed easier than the syllable discrimination task, and almost all of the patients and the control group  performed better on the AWSM than syllable discrimination task.

++++++++++++++++++++++++++++++++++++++++++++++++++
GH: "Almost all of the patients" is in fact 6 of 8 meaning that 25% of your now rather small sample failed to improve.  Two more cases, those with d' ~ 2.5 are darn close to negligible improvement. So at best half of your sample improved noticeably.  Given the sample size, I don't put a lot of weight on the result.
++++++++++++++++++++++++++++++++++++++++++++++++++


--> Thus, our findings argue that as long as you match tasks demands and perceptual discriminability across sublexical and lexical tasks, syllable discrimination is a perfectly reasonable measure of sublexical processing, which can predict performance on a lexical task to a high degree.  It should be the burden of the researcher to design carefully matched tasks to isolate processes of interest. Specifically regarding the findings at hand, our results support the use of a standard syllable discrimination task as a predictor of lexical processing.


++++++++++++++++++++++++++++++++++++++++++++++++++
GH: The way you make syllable discrimination predict auditory comprehension performance is to impose the same kinds of artificial task demands on auditory comprehension. You argued this yourself in pointing out that discrimination imposes an additional demand over comprehension: the requirement of holding two items in memory while making a decision. If you are interested in understanding the cognitive and neural basis of performing a difficult metalinguistic speech task, then by all means use syllable discrimination.  If, on the other hand, you want to understand how speech is analyzed in real world under ecologically valid conditions, then syllable discrimination can lead you astray.  

++++++++++++++++++++++++++++++++++++++++++++++++++

No comments: