Monday, August 14, 2017

Guest blog post from Dial & Martin on Dual Stream models -- More discussion

[Note: for backstory on this discussion, see here]

To indicate which comment each response corresponds to, we have copied the first line of the comment.

In response to “I agree but we were vague on purpose 17 years ago because we simply didn’t know what the relation was between brain areas and acoustic/linguistic levels of representation” and “These are good points and I both appreciate DM’s frustration with our lack of clarity regarding the level of processing we are talking about and laud their interest in being more precise”:

We appreciate the clarification of your stance regarding speech-specificity (or linguistic-specificity) and agree that specificity of processing within a neuroanatomical region is not necessarily a prerequisite for identifying neuroanatomical levels of processing. This being said, we still feel it is important when discussing the neural basis of speech processing to be as specific as possible in the claims regarding the underlying cognitive model. We would, thus, stand by our claim that in a model of speech processing, there is necessarily an abstract, speech-specific sublexical processing level (see figure below). However, we agree that there is no convincing evidence that “linguistic levels of representation will map neatly onto individual brain regions” and that there may not be a region that is “linguistic or level specific,” at least to the extent that current technology allows for the investigation of such questions (i.e., at the level of large populations of neurons). 

In response to  “’most often’ is a fair statement but one that ignores the fact that not all of the studies showing this dissociation were unmatched”:

We discussed directly in our paper the fact that in the Miceli et al. (1980) study, the phonological differences in their picture-word matching task were greater than those in the  sublexical task, where the items differed in one distinctive feature of one phoneme.  We stated on p. 193,  Although their (Miceli et al.’s) picture-word matching task included phonologically related distractors, the phonological lures differed from the target by one or more phonemes (picture-word matching task described in detail in Gainotti et al. (1975)) and when the difference was only one phoneme, the phoneme might differ by more than one distinctive feature from the target.”
We did not discuss Bishop et al. (1990) in our paper because  it is less often cited in papers on aphasia, as their subjects were children with SLI.  Nonetheless, their findings certainly bear on the general issue of the relation between sublexical and lexical perception. In the Bishop et al. paper, Study 1 compared phoneme discrimination (with one distinctive feature difference) to picture-word matching on a British version of the  PPVT, which does not systematically include phonologically related distractors. Study 2 did provide a close comparison between performance on phoneme discrimination and lexical processing using a word judgment task in which subjects judged whether a spoken stimulus matched the name of a picture.  On the non-matching trials, the stimulus was, according to the examples, a nonword differing by one distinctive feature of one phoneme (e.g., “voy” for a picture of a boy).  This comparison is similar to that in our Experiment 2a, where we contrasted  syllable discrimination and picture word matching where the stimulus on the non-matching trials was a word differing by one distinctive feature of one phoneme (e.g., “beach” for a picture of a peach).  In Experiment 2a, we found that controls as a group performed better on picture-word matching than on syllable discrimination and for the patients, though the group difference was not significant, two patients showed significantly better performance on picture-word matching. As we discuss in the paper, we hypothesized that picture-word matching might have been easier for some individuals because it allows the subjects to internally generate a phonological representation of the name of the picture against which to compare the spoken input.  To address this, we carried out Experiment 2b, where the sublexical task involved matching a spoken syllable to a written syllable, where subjects could generate a target from the written syllable.  With this change, now controls were significantly better at the sublexical than the lexical task and no patient scored significantly better on the lexical than the sublexical task.  Thus, when task demands were better equated there was no evidence of a dissociation between sublexical and lexical performance. 

In response to “Here’s the key point that you are missing”:
We showed that patient performance on a difficult visual working memory task (where performance was equated to that for syllable discrimination) was not correlated with performance on either the syllable or word discrimination tasks (page 201), suggesting that neither cognitive control nor working memory is the driver behind the correlation.

In response to “Still highly correlated which would argue against my point above, but a closer look at the data reveals a different picture”:

The figures that you have created in response to our comment do not contain the data we reference (from Experiment 2a) and instead plot data from Experiment 1a (titled: sublexical and lexical perception with unmatched stimuli, p.195, with data shown in Table 2).  Experiment 1a was created for the express purpose of showing that dissociations between sublexical and lexical performance could be shown when the stimuli were not closely matched (i.e., when some of the lexical contrasts differed by more than 1 feature).  As we noted in the text, in this experiment, there are notable dissociations with some patients showing better lexical performance under these conditions, which is particularly evident in your figure with the outlier removed. We refer you instead to Figure 6 (p. 202), which shows the data for picture-word matching and syllable discrimination for the matched stimuli in Experiment 2a.  These are the data we were referring to in saying the correlation was .88 even though 2 patients did show significantly better performance on picture-word matching than syllable discrimination.  For the data in Fig 6, there are no evident outliers (which is confirmed by statistical tests such as Mahalanobis distance or Cook’s D).

In response to “We have to ask why comprehension tasks are easier”:

Simply because we do not perform discrimination tasks “in the wild” does not make the task invalid. We could further argue that individuals don’t typically spend their time naming pictures, or selecting a picture from a set after hearing a single spoken word, but we don’t believe this discredits the tasks. Our main argument, on which it seems we actually don’t disagree, is that you have to match the task in order for the linguistic manipulations to be the driving factor behind the results.
In terms of the generation of a phonological code, it would be difficult to do that in advance for all the pictures in a 6 item set, as in Miceli et al. (1980). However, we again refer to the fact that Miceli et al. (1980) did not closely match the phonological distractors in this task to the CCVC task, making it easier to begin with. And, in our task, generation of a phonological code would be possible as only a single picture is presented. In terms of not having to maintain two items in memory for comparison, we are in agreement as to why the picture-word matching task may be easier.    

In response to “’Almost all of the patients’ is in fact 6 of 8 meaning that 25% of your now rather small sample failed to improve”:

We agree that the small sample size makes our claim regarding AWSM and syllable discrimination tenuous in the patient sample, but we note that the controls also did better on the AWSM (M = 3.77) than the syllable discrimination task (M = 2.68). Even so, our claim that the PWM task was easier than the syllable discrimination task is the more important of the claims, because we were arguing that syllable discrimination was harder than PWM, necessitating a matched sublexical task. In fact, the controls did significantly better on PWM (M = 3.42) than syllable discrimination. The mean performance on the PWM task was much closer to the mean for the AWSM in our normal control population, suggesting these tasks are more closely matched than the PWM and syllable discrimination tasks.

In response to “The way you make syllable discrimination predict auditory comprehension performance is to impose the same kinds of artificial task demands on auditory comprehension”:

As indicated in the paper and all that we have discussed above, syllable discrimination is an excellent predictor of lexical performance when stimuli and task demands are matched.  We would also note that syllable discrimination predicted auditory lexical decision in Experiment 1b at a high level (r = .74). Also, even though the comparison of picture-word matching and syllable discrimination in Experiment 2a revealed two patients who did better on picture-word matching than syllable discrimination, the correlation between the two tasks was quite high (r = .78). The results from Experiment 2b suggest that one can create a syllable discrimination task using spoken to written syllable matching that also correlates highly with picture-word matching, but where no subject does better on the picture-word matching task.  A limitation of this latter task is that many aphasic patients have difficulty reading nonsense syllables, which will limit those who can be tested.  On the other hand, there is a clear limitation to using picture-word matching for assessing speech recognition in that poor performance can result from a disruption of semantic knowledge rather than from difficulty processing the speech input. Thus, all tasks have their advantages and disadvantages in assessing particular cognitive functions. It is only the pattern of performance across a set of converging tasks that can provide strong evidence regarding the source of any deficit.  We maintain that the use of sublexical tasks like syllable discrimination may provide a valid indicator of an individual’s sublexical speech processing abilities and that the use of this task may be useful in predicting word recognition abiltiies.


Pathways for intelligible speech - setting the record straight

A couple days ago I tweeted the following regarding Sophie Scott's influential study on pathways for intelligible speech:

I followed this up saying that the finding--left anterior STS activation for intelligible compared to unintelligible speech--had not been replicated in subsequent studies, one of my own in particular, with larger sample size.

A number of people, including the lead author, felt it was an unfair attack and that the finding has held up to replication.  Let me clarify a few things.

My motivation: The motivation for this post actually came from something I saw on Twitter.  It was a gif of a person suddenly looking completely deflated; the caption read something like 'When you realize the study at the foundation of your entire theory has N=9'.  This made me think of the Scott study that has influenced a lot of models (e.g., Rauchecker and Scott 2009 write, "Speech perception and production are left-lateralized in the human brain") yet is quite underpowered by today's standards. 

Replication: I claimed that the study hadn't replicated.  In particular, a study from my lab with Kai Okada as lead was a direct replication of the Scott et al. study.  Like Scott et al. we found left aSTS activation for intelligible vs. unintelligible speech, but also found that the left aSTS was only the tip of the neural activation iceberg.  Not only did the contrast activate regions extending along the length of the STS, anterior to posterior, and did so bilaterally,

but we also found that using pattern classification methods, even Heschl's gyrus activity (that didn't show up in the intel vs. unintel contrast) could discriminate the two conditions.

So when I said the finding failed to replicate, what I meant was that subsequent findings failed to reproduce the pattern that the left aSTS was the only region selective for intelligible speech.  This is important, theoretically, because it speaks to differences between models, e.g., Hickok & Poeppel who argue that pSTS is part of the ventral stream versus Rauschecker & Scott who argue that the ventral stream flows only in the anterior direction from A1 (see their Figure 5).

My take on the role of Scott et al. 2000 in influencing the debate: I targeted the Scott et al. 2000 study because I believed that its emphasis on the left anterior STS in speech recognition is overly influential on current theory. My thinking was that if that original study had found what Okada et al. reported, we would be in theoretically more balanced place.

Criticism of my take: Some respondents took me to task on my critique. Here's an exchange with Johnathan Peelle

So maybe I'm being too harsh.  It's true that my belief regarding the influence of the Scott et al. 2000 study is based on my personal impression (supplemented by its citation frequency), which is no doubt biased because the left aSTS exclusivity is at odds with my own theory.  Maybe all those citations, or at least the recent ones, acknowledge that that study is not the whole picture.  Maybe they cite it just to provide evidence for *a* role of the left aSTS in speech recognition, which I agree with too, rather than *the* role.  Maybe it is cited only in connection with sentence level processing.

Evaluation of my assumptions: I decided to take a look at how the Scott et al. 2000 study is cited. This is not a systematic examination.  Basically I looked at only those studies published in 2016.  Here's what I found.

A number studies correctly and appropriately cite Scott et al., either methodologically or as one finding highlighting part of a bigger network (Peelle's papers are a good example). Several, however, still cite the paper as evidence for a left anterior core for speech recognition, sometimes as the only paper cited:

"Further, there is compelling evidence that sensory areas feed into a pathway running from posterior in the temporal lobe to anterior aspects (Scott et al., 2000)" - Santi et al. 
"The finding of left hemisphere dominance in the tract associations with the more linguistic domains is not unexpected and is highly consistent with previous findings (Rosen et al., 2011;  Scott et al., 2000)." - Bajada, et al. 
"It provides a mechanistic explanation for the preponderant role of the anterior temporal lobe in lexical semantics as delineated by studies examining speech comprehension." - Ries et al.
"Together, these results are consistent with studies indicating that phonetic recognition occurs in the left anterolateral superior temporal cortex (i.e. the ventral auditory stream) (Binder et al., 2000; Scott et al., 2000; Leaver and Rauschecker, 2010; DeWitt and Rauschecker, 2012)." -Alho et al.
Here's one citation from a dissertation.  This is clearly an outlier and perhaps it should be disregarded as it hasn't gone through peer review, but another perspective on it is that it reflects the weight of that original finding on the field.
"This author knows of only one study that has examined nonintelligible speech-like sounds; and interestingly, no significant pSTS activation was found (Scott et al., 2000)."  
Conclusion: My take is that many investigators are citing Scott et al. appropriately, as Jonathan Peelle suggested.  I also see some evidence, however, that the study biases some researchers toward the view that the left anterior STS/STG is the critical region for speech recognition/lexical processing.  Put differently, I still believe that if the original study had found bilateral activity in both anterior and posterior regions, we would as a field be in somewhat of a different place, with fewer groups emphasizing the exclusivity of the anterior pathway (e.g., the Alho et al. quote).  In this sense, I see evidence in support of the assumption that motivated the tweets in the first place.

Was I too harsh?  I think yes.  Even though I still believe the original study is over-emphasized theoretically and inaccurately colors interpretation of the neural basis of speech processing in some circles, the content of my initial tweets made it sound like the work is completely useless.  That was not my intention or my belief and for that, Sophie and all, please accept my apology.

Lessons for all of us: In a 140 character statement it is way too easy to come across in a way that wasn't intended. I will certainly re-read my draft tweets and consider how they might be read.  I'm sure this won't prevent alternative readings but it could help reduce them. At the same time I will also give other tweeters a little slack if I read something that rubs me the wrong way and ask for clarification.