The following is a guest post from Heather Dial and Randi Martin. I (Greg) have provided my comments on their post interspersed in italics and set off by "++++" symbols.
Greg Hickok has noted several positive aspects of our recent
paper but also claims that we have misunderstood the dual stream model and that
our findings are, in fact, consistent with this model. We wish to respond to
his arguments focusing on two main points: claims of the dual stream model and
implications of our data.
Regarding the dual
stream model
In this blog post, Hickok argues that we have misunderstood
the dual stream model claims regarding sublexical processing. However, we argue
that there is a lack of clarity in Hickok and Poeppel’s claims regarding the
dual stream model, with statements across different articles seeming to imply
different functions that are shared prior to divergence into the dorsal and
ventral routes. We will highlight this point with a series of examples. In the
2007 paper, on p. 394, Hickok and Poeppel note that “there is overlap...in the
computational operations leading up to and including the generation of
sublexical representations.” If this were the entirety of the claims made by
Hickok and Poeppel, then we agree that our findings are perfectly compatible with the dual stream model. However, in
referencing the 2000 paper, it is not clear that the claim has always been that
sublexical processing is shared. In fact, even the quote he takes from the
abstract of the 2000 paper does not argue for sublexical processing as a
requirement, but rather for shared auditory processing. In the 2000 paper (p.
131), Hickok and Poeppel expand by stating that:
“auditory related cortical fields
in the posterior half of the superior temporal lobe, bilaterally, constitute
the primary substrate for constructing sound-based representations of speech.
From this point, however, we will argue that there are at least two distinct
pathways that participate in speech perception in a task-dependent manner”
In our paper, we emphasize that sublexical processing refers
to the processing of abstract, speech-specific
representations. The “sound-based representations” that Hickok is referring to
in the abstract for the 2000 paper are not necessarily speech-specific.
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: I agree but we were vague on purpose 17 years ago because we simply didn't know what the relation was between brain areas and acoustic/linguistic levels of representation. This lack of knowledge is clearly indicated (but kind of buried) in the "Outstanding Questions" box in our 2000 article, where we write, "We have proposed that superior temporal lobe structures play an
important role in constructing ‘sound-based representations of speech.' This process is complex, probably involving multiple levels of representation. How does this general notion of sound-based representations map onto the different linguistic levels of representation (e.g. phonetic features, syllabic structure, etc.)? Are there neuroanatomical subdivisions within auditory cortex that correspond to these levels of representation?"
We deal with this issue in more depth in Hickok & Poeppel 2004 as well as the issue of speech-specificity. Regarding the latter, we have always taken a rather agnostic position, arguing that specificity is for the most part an independent question in mapping the neurocomputational steps from sound to meaning or sound to articulation. In short, we (or at least I) do not subscribe to the view that specificity is a prerequisite for identifying a linguistic level of processing, which undermines DM's point. Here is the relevant paragraph from our 2004 paper (p. 69):
++++++++++++++++++++++++++++++++++++++++++++++++++
Moving past the 2000 paper to more
recent instantiations of the dual stream model, it remains unclear what
processing is shared. The figure that is placed in the blog post comes from the
2007 paper, and Hickok refers to the phonological processing portion (shaded
yellow) as evidence that the dorsal and ventral streams share a sublexical
processing level. However, on p. 398, Hickok and Poeppel note in a discussion
of the superior temporal sulcus, the proposed site of the phonological network,
that “STS activation can be modulated by the manipulation of psycholinguistic
variables that tap phonological networks, such as phonological neighborhood
density.” We would note that phonological neighborhood density is considered a
lexical-level variable (e.g., Luce & Pisoni, 1998; Vitevitch & Luce,
1998; Vitevitch & Luce, 1999; Vitevitch, Luce, Charles-Luce & Kemmerer,
1997), which makes it sound like processing in this area is lexical rather than
sublexical.
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: These are good points and I both appreciate DM's frustration with our lack of clarity regarding the level of processing we are talking about and laud their interest in being more precise. However, as before, we are being purposefully vague because (i) we weren't confident given available evidence that we could nail down a particular level of representation to the STS, (ii) we/I aren't convinced that linguistic levels of representation will map neatly only individual brain regions, (iii) we/I don't believe that a *region* is going to linguistic or level specific (although embedded networks may be). For me, the observation that the STS is modulated by factors like phonological neighborhood density tells me that something in the representational vicinity of phonology is happening in the STS. And while I appreciate arguments that density effects are thought to reflect lexical processing, I'm not willing to use that conclusion to anchor my interpretation of what's happening in the STS for the reasons listed above.
++++++++++++++++++++++++++++++++++++++++++++++++++
Yet another perspective seems to come from the Vaden, Piquado, & Hickok (2011) paper, that Hickok references in the blog. Vaden et al. state on p. 2665, “The current study aimed to functionally identify sublexical phonological activity during spoken word recognition.” This was done by examining the brain regions sensitive to phonotactic frequency in spoken word recognition. They found that this manipulation modulated activity in Broca’s area and not in superior temporal lobe regions. They conclude on p. 2672, “This finding… is more consistent with speech perception models in which segmental information is not explicitly accessed during word recognition.” We had interpreted this perhaps too broadly as implying that sublexical processing in general is not explicitly accessed, though as Hickok notes, it is possible that sublexical units other than phonemes might necessarily be involved (e.g., syllables) in word recognition. However, such a possibility appears yet to be tested in a fashion to distinguish segmental and syllable-level representations. It should be emphasized, however, that other researchers have found that activation in the superior temporal lobe does respond to sublexical manipulations (see Dial & Martin, p. 205).
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: These are good points and I both appreciate DM's frustration with our lack of clarity regarding the level of processing we are talking about and laud their interest in being more precise. However, as before, we are being purposefully vague because (i) we weren't confident given available evidence that we could nail down a particular level of representation to the STS, (ii) we/I aren't convinced that linguistic levels of representation will map neatly only individual brain regions, (iii) we/I don't believe that a *region* is going to linguistic or level specific (although embedded networks may be). For me, the observation that the STS is modulated by factors like phonological neighborhood density tells me that something in the representational vicinity of phonology is happening in the STS. And while I appreciate arguments that density effects are thought to reflect lexical processing, I'm not willing to use that conclusion to anchor my interpretation of what's happening in the STS for the reasons listed above.
++++++++++++++++++++++++++++++++++++++++++++++++++
Yet another perspective seems to come from the Vaden, Piquado, & Hickok (2011) paper, that Hickok references in the blog. Vaden et al. state on p. 2665, “The current study aimed to functionally identify sublexical phonological activity during spoken word recognition.” This was done by examining the brain regions sensitive to phonotactic frequency in spoken word recognition. They found that this manipulation modulated activity in Broca’s area and not in superior temporal lobe regions. They conclude on p. 2672, “This finding… is more consistent with speech perception models in which segmental information is not explicitly accessed during word recognition.” We had interpreted this perhaps too broadly as implying that sublexical processing in general is not explicitly accessed, though as Hickok notes, it is possible that sublexical units other than phonemes might necessarily be involved (e.g., syllables) in word recognition. However, such a possibility appears yet to be tested in a fashion to distinguish segmental and syllable-level representations. It should be emphasized, however, that other researchers have found that activation in the superior temporal lobe does respond to sublexical manipulations (see Dial & Martin, p. 205).
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: I agree that there is not anything near consensus on whether or not segments are represented in the superior temporal lobe. I don't think they are, but this is still a very open question. Admittedly, we were not clear in Vaden, et al. what level of representation is being processed in the STS. Although we indicated that our results are in line with proposals by Massaro, who argues for demi-syllables (CV/VC) as the unit of perceptual analysis, we did not come out and say that we believe this is the unit of analysis in the superior temporal lobe.
++++++++++++++++++++++++++++++++++++++++++++++++++
We wish to reiterate that if the dual route model’s claim is that the phonological network instantiates a sublexical processing level, then our findings are wholly compatible with this claim. If the current debate serves to clarify the issue of what phonological processes are argued to be shared prior to divergence into the two streams, then that will be a step forward.
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: Indeed, as I've tried to show, the claim of the dual route model is and always has been that the phonological network in the superior temporal lobe does involve sublexical levels of processing. I absolutely agree that this mini debate has served to clarify this point and moved us forward.
++++++++++++++++++++++++++++++++++++++++++++++++++
Regarding our data
and the use of syllable discrimination to tap sublexical processing
In addition to the theoretical
issue of shared sublexical processing, our paper raised an important
methodological issue regarding whether syllable discrimination predicts lexical
processing, as might be expected if lexical processing depends on sublexical
processing (particularly so if syllables serve as the unit of sublexical
phonological coding). Hickok and Poeppel have largely discredited the use of
this task in assessing speech perception. For example, Hickok and Poeppel
(2004, p.74) state:
“Sub-lexical tasks (syllable
discrimination/identification) presumably represent an attempt to isolate and
study the early stages in this normal comprehension process, that is, the
acoustic– phonetic analysis and/or the sub-lexical processing stage. The
paradox, of course, stems from the fact that patients exist who cannot
accurately perform syllable discrimination/identification tasks, yet have
normal word comprehension: if sub-lexical tasks isolate and measure early
stages of the word comprehension process, deficits on sub-lexical tasks should
be highly predictive of auditory comprehension deficits, yet they are not. What
we suggest is that performance on sub-lexical tasks involves neural circuits
beyond (i.e. a superset of) those involved in the normal comprehension process.
This is an important observation because there are many studies of the
functional anatomy of ‘speech perception’ that utilize sub-lexical tasks.
Because sub-lexical tasks recruit neural circuits beyond those involved in word
comprehension, the outcome of such studies may paint a misleading picture of
the neural organization of speech perception, as it is used under more normal
listening conditions.” [emphasis added]
And, Hickok and Poeppel (2007, p. 394) note that:
“the use of sublexical tasks would
seem to be a logical choice for assessing these sublexical processes, except
for the empirical observation that speech perception and speech recognition
doubly dissociate.”
However, we would argue that tasks like syllable
discrimination are valid assessments of sublexical processing that can be
highly predictive of lexical processing. The behavioral double dissociations
that Hickok and Poeppel refer to have most often been derived from studies in
which the perceptual discriminations required in the sublexical tasks were much
more difficult than those in the lexical tasks (e.g., single distinctive
features in the sublexical discrimination task, and no phonological overlap
with distractors in a picture-word matching task, as in the WAB word
recognition subtest).
++++++++++++++++++++++++++++++++++++++++++++++++++
When sublexical and lexical tasks are matched in the required discriminations, then performance on the two is highly related (e.g., our correlation between syllable discrimination and word discrimination for matched stimuli was .96).
We did find, however, as pointed out in this blog
post, that even though performance on picture-word matching and syllable
discrimination were highly correlated (r=.86),
two patients performed
significantly better on our picture to word matching (PWM) task than on our
syllable discrimination task and that our control group performed better on the
PWM task than the syllable discrimination task. Hickok argues that this is
consistent with claims regarding the tasks tapping partially shared and
partially different processes. On this point, we do not disagree. Our findings
that controls performed better on the PWM task than the syllable discrimination
suggest that the PWM task is easier than
the syllable discrimination task. In other words, the PWM and syllable
discrimination tasks were not appropriately matched. We believe that one
important difference lies in the fact that the PWM task allows the participant
to generate an internal phonological code for the picture which they can then
compare to the auditory input. We thus created the auditory-written syllable
matching (AWSM) task that allows for this same internal generation of a
phonological code, thus matching task demands between the sublexical and
lexical processing tasks.
In addition, AWSM and PWM require maintenance of a single auditory percept to compare to a single picture or written syllable, whereas in the syllable discrimination task two items must be maintained. The AWSM was indeed easier than the syllable discrimination task, and almost all of the patients and the control group performed better on the AWSM than syllable discrimination task.
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: "most often" is a fair statement but one that ignores the fact that not all of the studies showing this dissociation were unmatched. In our 2004 paper we leaned heavily on one study in particular (Miceli, et al, 1980) that used a picture-word matching task with both semantic and phonemic distractors. It is because this study used a phonemically better matched comprehension test that we reproduced the data in Table 1 in Hickok & Poeppel 2004 to make the point. Although we didn't cite it in our review papers, I would also point you to Bishop et al. who found task effects in closely matched discrimination versus lexical status tasks. See this blog post.
++++++++++++++++++++++++++++++++++++++++++++++++++
When sublexical and lexical tasks are matched in the required discriminations, then performance on the two is highly related (e.g., our correlation between syllable discrimination and word discrimination for matched stimuli was .96).
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: Here's the key point that you are missing. It's not so much about the stimuli (word vs. syllable), it's about the *task* (discrimination). It does not surprise me at all that these two tasks are highly correlated: they are the same task. In anticipation of a rebuttal I'll note that it is possible to perform this task over different representations, phonological versus semantic, but that doesn't mean (i) that patients actually do it that way (they may be doing both phonologically) or (ii) that there isn't still some shared process like cognitive control or working memory that is driving the correlation.
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: Still highly correlated which would argue against my point above, but a closer look at the data reveals a different picture. Here's a plot of correlation between "consonant discrimination" and "single PWM phonological foils":
An outlier is apparent, which happens to be the only Wernicke's patient in the sample, i.e., the case where we would expect the most severe auditory comprehension deficit. Furthermore, this case had bilateral lesions involving the superior temporal lobe! According to the dual stream model, we would expect a significant generalized speech perception deficit. It is no surprise at all, then, that this case was poor on both syllable discrimination and single word auditory comprehension. If we remove this case from the analysis, the correlation between discrimination and comprehension disappears (r = 0.349, p = 0.266; BF = 0.62):
DM's data, therefore, provides further evidence for the dissociability of discrimination and comprehension tasks in contrast to the claim. Continuing on...
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: We have to ask why comprehension tasks are easier. I would argue they are easier because they involve the natural task processes involved in normal everyday speech processing in the wild. Discrimination is not a task we ever perform except in laboratories. I argue that it involves cognitive operations that are not normally used in normal speech processing, hence it is harder. True, they aren't matched, but that's the point! You suggest that pictures allow the generation of an internal phonological code that can be compared to the auditory input. That actually seems harder in some ways than having that code given to you overtly in a discrimination task. How does the subject know which internal phonological code to generate from the array of pictures (e.g., Miceli et al. used 6 pictures)? But more specifically, you suggest that PWM is easier because you don't have to maintain two items in memory for comparison. I agree! That has been my argument all along. You have to bring to bear additional processes beyond those normally used in speech recognition in order to perform the syllable discrimination task *because of the task*.
++++++++++++++++++++++++++++++++++++++++++++++++++
In addition, AWSM and PWM require maintenance of a single auditory percept to compare to a single picture or written syllable, whereas in the syllable discrimination task two items must be maintained. The AWSM was indeed easier than the syllable discrimination task, and almost all of the patients and the control group performed better on the AWSM than syllable discrimination task.
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: "Almost all of the patients" is in fact 6 of 8 meaning that 25% of your now rather small sample failed to improve. Two more cases, those with d' ~ 2.5 are darn close to negligible improvement. So at best half of your sample improved noticeably. Given the sample size, I don't put a lot of weight on the result.
++++++++++++++++++++++++++++++++++++++++++++++++++
--> Thus, our findings argue that as long as you match tasks demands and perceptual discriminability across sublexical and lexical tasks, syllable discrimination is a perfectly reasonable measure of sublexical processing, which can predict performance on a lexical task to a high degree. It should be the burden of the researcher to design carefully matched tasks to isolate processes of interest. Specifically regarding the findings at hand, our results support the use of a standard syllable discrimination task as a predictor of lexical processing.
++++++++++++++++++++++++++++++++++++++++++++++++++
GH: The way you make syllable discrimination predict auditory comprehension performance is to impose the same kinds of artificial task demands on auditory comprehension. You argued this yourself in pointing out that discrimination imposes an additional demand over comprehension: the requirement of holding two items in memory while making a decision. If you are interested in understanding the cognitive and neural basis of performing a difficult metalinguistic speech task, then by all means use syllable discrimination. If, on the other hand, you want to understand how speech is analyzed in real world under ecologically valid conditions, then syllable discrimination can lead you astray.
++++++++++++++++++++++++++++++++++++++++++++++++++