Friday, March 31, 2017

Guest blog post from Dial & Martin on Dual Stream models

The following is a guest post from Heather Dial and Randi Martin. I (Greg) have provided my comments on their post interspersed in italics and set off by "++++" symbols. 

Greg Hickok has noted several positive aspects of our recent paper but also claims that we have misunderstood the dual stream model and that our findings are, in fact, consistent with this model. We wish to respond to his arguments focusing on two main points: claims of the dual stream model and implications of our data.

Regarding the dual stream model
In this blog post, Hickok argues that we have misunderstood the dual stream model claims regarding sublexical processing. However, we argue that there is a lack of clarity in Hickok and Poeppel’s claims regarding the dual stream model, with statements across different articles seeming to imply different functions that are shared prior to divergence into the dorsal and ventral routes. We will highlight this point with a series of examples. In the 2007 paper, on p. 394, Hickok and Poeppel note that “there is the computational operations leading up to and including the generation of sublexical representations.” If this were the entirety of the claims made by Hickok and Poeppel, then we agree that our findings are perfectly compatible with the dual stream model. However, in referencing the 2000 paper, it is not clear that the claim has always been that sublexical processing is shared. In fact, even the quote he takes from the abstract of the 2000 paper does not argue for sublexical processing as a requirement, but rather for shared auditory processing. In the 2000 paper (p. 131), Hickok and Poeppel expand by stating that:

“auditory related cortical fields in the posterior half of the superior temporal lobe, bilaterally, constitute the primary substrate for constructing sound-based representations of speech. From this point, however, we will argue that there are at least two distinct pathways that participate in speech perception in a task-dependent manner”

In our paper, we emphasize that sublexical processing refers to the processing of abstract, speech-specific representations. The “sound-based representations” that Hickok is referring to in the abstract for the 2000 paper are not necessarily speech-specific.

GH: I agree but we were vague on purpose 17 years ago because we simply didn't know what the relation was between brain areas and acoustic/linguistic levels of representation.  This lack of knowledge is clearly indicated (but kind of buried) in the "Outstanding Questions" box in our 2000 article, where we write, "We have proposed that superior temporal lobe structures play an important role in constructing ‘sound-based representations of speech.' This process is complex, probably involving multiple levels of representation. How does this general notion of sound-based representations map onto the different linguistic levels of representation (e.g. phonetic features, syllabic structure, etc.)? Are there neuroanatomical subdivisions within auditory cortex that correspond to these levels of representation?" 

We deal with this issue in more depth in Hickok & Poeppel 2004 as well as the issue of speech-specificity.  Regarding the latter, we have always taken a rather agnostic position, arguing that specificity is for the most part an independent question in mapping the neurocomputational steps from sound to meaning or sound to articulation.  In short, we (or at least I) do not subscribe to the view that specificity is a prerequisite for identifying a linguistic level of processing, which undermines DM's point. Here is the relevant paragraph from our 2004 paper (p. 69):


 Moving past the 2000 paper to more recent instantiations of the dual stream model, it remains unclear what processing is shared. The figure that is placed in the blog post comes from the 2007 paper, and Hickok refers to the phonological processing portion (shaded yellow) as evidence that the dorsal and ventral streams share a sublexical processing level. However, on p. 398, Hickok and Poeppel note in a discussion of the superior temporal sulcus, the proposed site of the phonological network, that “STS activation can be modulated by the manipulation of psycholinguistic variables that tap phonological networks, such as phonological neighborhood density.” We would note that phonological neighborhood density is considered a lexical-level variable (e.g., Luce & Pisoni, 1998; Vitevitch & Luce, 1998; Vitevitch & Luce, 1999; Vitevitch, Luce, Charles-Luce & Kemmerer, 1997), which makes it sound like processing in this area is lexical rather than sublexical. 

GH: These are good points and I both appreciate DM's frustration with our lack of clarity regarding the level of processing we are talking about and laud their interest in being more precise.  However, as before, we are being purposefully vague because (i) we weren't confident given available evidence that we could nail down a particular level of representation to the STS, (ii) we/I aren't convinced that linguistic levels of representation will map neatly only individual brain regions, (iii) we/I don't believe that a *region* is going to linguistic or level specific (although embedded networks may be). For me, the observation that the STS is modulated by factors like phonological neighborhood density tells me that something in the representational vicinity of phonology is happening in the STS.  And while I appreciate arguments that density effects are thought to reflect lexical processing, I'm not willing to use that conclusion to anchor my interpretation of what's happening in the STS for the reasons listed above. 

Yet another perspective seems to come from the Vaden, Piquado, & Hickok (2011) paper, that Hickok references in the blog.  Vaden et al. state on p. 2665, “The current study aimed to functionally identify sublexical phonological activity during spoken word recognition.”  This was done by examining the brain regions sensitive to phonotactic frequency in spoken word recognition. They found that this manipulation modulated activity in Broca’s area and not in superior temporal lobe regions. They conclude on p. 2672, “This finding… is more consistent with speech perception models in which segmental information is not explicitly accessed during word recognition.” We had interpreted this perhaps too broadly as implying that sublexical processing in general is not explicitly accessed, though as Hickok notes, it is possible that sublexical units other than phonemes might necessarily be involved (e.g., syllables) in word recognition.  However, such a possibility appears yet to be tested in a fashion to distinguish segmental and syllable-level representations.  It should be emphasized, however, that other researchers have found that activation in the superior temporal lobe does respond to sublexical manipulations (see Dial & Martin, p. 205).

GH: I agree that there is not anything near consensus on whether or not segments are represented in the superior temporal lobe.  I don't think they are, but this is still a very open question. Admittedly, we were not clear in Vaden, et al. what level of representation is being processed in the STS. Although we indicated that our results are in line with proposals by Massaro, who argues for demi-syllables (CV/VC) as the unit of perceptual analysis, we did not come out and say that we believe this is the unit of analysis in the superior temporal lobe.

We wish to reiterate that if the dual route model’s claim is that the phonological network instantiates a sublexical processing level, then our findings are wholly compatible with this claim. If the current debate serves to clarify the issue of what phonological processes are argued to be shared prior to divergence into the two streams, then that will be a step forward.

GH: Indeed, as I've tried to show, the claim of the dual route model is and always has been that the phonological network in the superior temporal lobe does involve sublexical levels of processing. I absolutely agree that this mini debate has served to clarify this point and moved us forward.

Regarding our data and the use of syllable discrimination to tap sublexical processing
In addition to the theoretical issue of shared sublexical processing, our paper raised an important methodological issue regarding whether syllable discrimination predicts lexical processing, as might be expected if lexical processing depends on sublexical processing (particularly so if syllables serve as the unit of sublexical phonological coding). Hickok and Poeppel have largely discredited the use of this task in assessing speech perception. For example, Hickok and Poeppel (2004, p.74) state:
“Sub-lexical tasks (syllable discrimination/identification) presumably represent an attempt to isolate and study the early stages in this normal comprehension process, that is, the acoustic– phonetic analysis and/or the sub-lexical processing stage. The paradox, of course, stems from the fact that patients exist who cannot accurately perform syllable discrimination/identification tasks, yet have normal word comprehension: if sub-lexical tasks isolate and measure early stages of the word comprehension process, deficits on sub-lexical tasks should be highly predictive of auditory comprehension deficits, yet they are not. What we suggest is that performance on sub-lexical tasks involves neural circuits beyond (i.e. a superset of) those involved in the normal comprehension process. This is an important observation because there are many studies of the functional anatomy of ‘speech perception’ that utilize sub-lexical tasks. Because sub-lexical tasks recruit neural circuits beyond those involved in word comprehension, the outcome of such studies may paint a misleading picture of the neural organization of speech perception, as it is used under more normal listening conditions.” [emphasis added]

And, Hickok and Poeppel (2007, p. 394) note that:

“the use of sublexical tasks would seem to be a logical choice for assessing these sublexical processes, except for the empirical observation that speech perception and speech recognition doubly dissociate.”

However, we would argue that tasks like syllable discrimination are valid assessments of sublexical processing that can be highly predictive of lexical processing. The behavioral double dissociations that Hickok and Poeppel refer to have most often been derived from studies in which the perceptual discriminations required in the sublexical tasks were much more difficult than those in the lexical tasks (e.g., single distinctive features in the sublexical discrimination task, and no phonological overlap with distractors in a picture-word matching task, as in the WAB word recognition subtest).

GH: "most often" is a fair statement but one that ignores the fact that not all of the studies showing this dissociation were unmatched.  In our 2004 paper we leaned heavily on one study in particular (Miceli, et al, 1980) that used a picture-word matching task with both semantic and phonemic distractors. It is because this study used a phonemically better matched comprehension test that we reproduced the data in Table 1 in Hickok & Poeppel 2004 to make the point.  Although we didn't cite it in our review papers, I would also point you to Bishop et al. who found task effects in closely matched discrimination versus lexical status tasks.  See this blog post

When sublexical and lexical tasks are matched in the required discriminations, then performance on the two is highly related (e.g., our correlation between syllable discrimination and word discrimination for matched stimuli was .96).

GH: Here's the key point that you are missing.  It's not so much about the stimuli (word vs. syllable), it's about the *task* (discrimination). It does not surprise me at all that these two tasks are highly correlated: they are the same task. In anticipation of a rebuttal I'll note that it is possible to perform this task over different representations, phonological versus semantic, but that doesn't mean (i) that patients actually do it that way (they may be doing both phonologically) or (ii) that there isn't still some shared process like cognitive control or working memory that is driving the correlation.  

 We did find, however, as pointed out in this blog post, that even though performance on picture-word matching and syllable discrimination were highly correlated (r=.86),

GH: Still highly correlated which would argue against my point above, but a closer look at the data reveals a different picture. Here's a plot of correlation between "consonant discrimination" and "single PWM phonological foils":

  An outlier is apparent, which happens to be the only Wernicke's patient in the sample, i.e., the case where we would expect the most severe auditory comprehension deficit. Furthermore, this case had bilateral lesions involving the superior temporal lobe!  According to the dual stream model, we would expect a significant generalized speech perception deficit. It is no surprise at all, then, that this case was poor on both syllable discrimination and single word auditory comprehension.  If we remove this case from the analysis, the correlation between discrimination and comprehension disappears (r = 0.349, p = 0.266; BF = 0.62):

DM's data, therefore, provides further evidence for the dissociability of discrimination and comprehension tasks in contrast to the claim.  Continuing on...

 two patients performed significantly better on our picture to word matching (PWM) task than on our syllable discrimination task and that our control group performed better on the PWM task than the syllable discrimination task. Hickok argues that this is consistent with claims regarding the tasks tapping partially shared and partially different processes. On this point, we do not disagree. Our findings that controls performed better on the PWM task than the syllable discrimination suggest that the PWM task is easier than the syllable discrimination task. In other words, the PWM and syllable discrimination tasks were not appropriately matched. We believe that one important difference lies in the fact that the PWM task allows the participant to generate an internal phonological code for the picture which they can then compare to the auditory input. We thus created the auditory-written syllable matching (AWSM) task that allows for this same internal generation of a phonological code, thus matching task demands between the sublexical and lexical processing tasks.

GH: We have to ask why comprehension tasks are easier. I would argue they are easier because they involve the natural task processes involved in normal everyday speech processing in the wild.  Discrimination is not a task we ever perform except in laboratories.  I argue that it involves cognitive operations that are not normally used in normal speech processing, hence it is harder.  True, they aren't matched, but that's the point! You suggest that pictures allow the generation of an internal phonological code that can be compared to the auditory input. That actually seems harder in some ways than having that code given to you overtly in a discrimination task.  How does the subject know which internal phonological code to generate from the array of pictures (e.g., Miceli et al. used 6 pictures)? But more specifically, you suggest that PWM is easier because you don't have to maintain two items in memory for comparison. I agree! That has been my argument all along. You have to bring to bear additional processes beyond those normally used in speech recognition in order to perform the syllable discrimination task *because of the task*.

 In addition, AWSM and PWM require maintenance of a single auditory percept to compare to a single picture or written syllable, whereas in the syllable discrimination task two items must be maintained. The AWSM was indeed easier than the syllable discrimination task, and almost all of the patients and the control group  performed better on the AWSM than syllable discrimination task.

GH: "Almost all of the patients" is in fact 6 of 8 meaning that 25% of your now rather small sample failed to improve.  Two more cases, those with d' ~ 2.5 are darn close to negligible improvement. So at best half of your sample improved noticeably.  Given the sample size, I don't put a lot of weight on the result.

--> Thus, our findings argue that as long as you match tasks demands and perceptual discriminability across sublexical and lexical tasks, syllable discrimination is a perfectly reasonable measure of sublexical processing, which can predict performance on a lexical task to a high degree.  It should be the burden of the researcher to design carefully matched tasks to isolate processes of interest. Specifically regarding the findings at hand, our results support the use of a standard syllable discrimination task as a predictor of lexical processing.

GH: The way you make syllable discrimination predict auditory comprehension performance is to impose the same kinds of artificial task demands on auditory comprehension. You argued this yourself in pointing out that discrimination imposes an additional demand over comprehension: the requirement of holding two items in memory while making a decision. If you are interested in understanding the cognitive and neural basis of performing a difficult metalinguistic speech task, then by all means use syllable discrimination.  If, on the other hand, you want to understand how speech is analyzed in real world under ecologically valid conditions, then syllable discrimination can lead you astray.  


Wednesday, March 22, 2017

Misunderstandings of the Hickok & Poeppel Dual Stream framework: Comments on Dial & Martin 2017

A recent paper by Dial & Martin (DM) presents some interesting data on the relation between performance on a range of different speech perception tasks including some that have been the topic of discussion on this blog and in many of my papers.  These include syllable discrimination and auditory comprehension among others.  I have argued in several papers with David Poeppel and others that these two tasks differentially engage the dorsal (syllable discrimination) and ventral streams (comprehension).  DM sought to test this claim by testing how well these tasks hang together or dissociate in a group of 13 aphasic patients.  Their primary claim is that performance on sublexical and comprehension tasks largely hang together in contrast to previous reports of double dissociations. They suggest the discrepancy is due to better controlled stimuli in their experiment compared to past studies. DM's experiments are really nicely done and generated some fantastic data.  I don't think their conclusions about the dual stream model follow, however, because they get the dual stream model wrong.

First a comment on their data, focusing on the syllable discrimination and word picture matching tasks  (their Experiment 2a) as these are the poster-child cases.  DM report a strong correlation between performance on these tasks.  It indeed looks quite strong.  But they also report that two patients (18%) performed significantly better on the auditory comprehension task compared to the discrimination task. The control group did the same: significantly better on comp than disc.  So this is consistent with claims that these tasks are tapping into partially shared, partially different processes, as Hickok &  Poeppel (HP) have claimed.  

Do these findings lead to a rejection of part of the HP dual stream framework claims?  DM say yes. Here's a couple quotes from their concluding remarks:
5.2. Concluding comments on implications for dual route models 
Though dual route models with a specific neuroanatomical basis like that of Hickok and Poeppel have been proposed relatively recently (Hickok and Poeppel, 2000), cognitive models of language processes with a dual route framework (though typically without a specified neural basis) are common in the neuropsychological literature, particularly for reading and repetition (e.g., Coltheart et al., 2001; Dell et al., 2007; Hanley et al., 2004; Hanley and Kay, 1997; Hillis and Caramazza, 1991; McCarthy and Warrington, 1984; Nozari et al., 2010). Critically, many of these models assume that sublexical processing is shared between the two routes and the routes do not become activated until after sublexical processing occurs. A similar approach could be applied in the speech perception domain. That is, one might assume that there are separable routes for translation to speech output and for accessing meaning, but assume that sublexical processing is shared by the two routes and must be accomplished before processing branches into the separate routes. 
... In summary, the current study provides support for models of speech perception where processing of sublexical information is a prerequisite for processing of lexical information, as is the case in TRACE (McClelland and Elman, 1986), NAM (Luce and Pisoni, 1998) and Shortlist/MERGE (Norris, 1994; Norris et al., 2000). On the other hand, we failed to find support for models that do not require passage through sublexical levels to reach lexical levels, such as the episodic theory of speech perception (e.g., Goldinger, 1998) or dual route models of speech perception (Hickok and Poeppel, 2000, 2004, 2007; Hickok, 2014; Poeppel and Hickok, 2004; Majerus, 2013; Scott and Wise, 2004; Wise et al., 2001).  [emphasis added]
The problem with these conclusions is that this characterization of the HP dual route framework is inaccurate.  We do not claim that the system does not require passage through sublexical levels.  Rather, we specifically propose a phonological (not lexical) level of processing/representation that is shared between the two routes, as is clear in our figure from HP 2007 (yellow shading).

This is not a new feature of the HP framework.  Our claim of a shared level of representation between dorsal and ventral streams goes back to our 2000 paper.  From the abstract:
In this review, we argue that cortical fields in the posterior–superior temporal lobe, bilaterally, constitute the primary substrate for constructing sound-based representations of speech, and that these sound-based representations interface with different supramodal systems in a task-dependent manner. [emphasis added]
To restate, we proposed one sound-based (not lexically based) speech network located in the STG region that interfaces with two systems in a task dependent manner.  This clearly predicts associations between tasks if the functional damage is in the shared region and dissociations if the functional damage is in one stream or the other.  Both patterns should be found and DM's study confirms this.

So where did the idea that HP propose that speech recognition/comprehension can skip the sublexical level?  They quote one of my papers with former student, Kenny Vaden, as support for this assumption:
For example, Vaden et al. (2011) state that: [sublexical] information is only represented on the motor side of speech processing and…[is] not explicitly extracted or represented as a part of spoken word recognition (p. 2672). 
But this is misleading especially when you look at the term that DM replaced with their bracketed [sublexical] term.  Here's the full quote from this paper:
Our findings are more in line with the view that segment level information is only represented explicitly on the motor side of speech processing and that segments are not explicitly extracted or represented as a part of spoken word recognition as some authors have proposed (Massaro, 1972).  -Vaden et al. 2011
Two things to note here.  One is that Vaden et al. are noting that the findings we reported were more in line with theories that did not specifically implicate segmental representations in speech recognition; we were not making a claim about the position of the dual stream model of HP.  Second and more importantly, there is a difference between sublexical and segmental.  Sublexical means things that are below the level of the word, which includes segments but also syllables or pieces of syllables. In recent years I have leaned more and more toward the view that segmental units are not represented on the perceptual/recognition side of speech processing as the Vaden et al. quote suggests.  (David's position is different, by the way, I think.)  But this view of mine does not imply that sublexical information isn't processed in the STG and shared between the two streams.  I believe it is!  And DM's findings are perfectly compatible with this view.

Moreover, the HP claim has little to do with the nature of the representation and more to do with the process.  Notice that we don't say that the dorsal stream is more involved in sublexical representations, we say that it is more involved in sublexical tasks.  It is about the task driven cognitive/metalinguistic/ecologically invalid processes that are invoked most strongly by sublexical tasks, what we called "explicit attention" in HP2000:
Tasks that regularly involve these extra-auditory left hemisphere structures [i.e., the dorsal stream] all seem to require explicit attention to segmental information.  Note that such tasks are fundamentally different from tasks that involve auditory comprehension: when one listens to an utterance in normal conversation, there is no conscious knowledge of the occurrence of specific phonemic segments, the content of the message is consciously retained. 
So, the reason why the dorsal stream gets involved in syllable discrimination is that it is a task that requires attentional mechanisms that aren't normally involved in normal speech recognition and the network over which these attentional mechanisms can best operate is the sensorimotor dorsal stream network.

The problem with tasks like syllable discrimination is NOT that they can't assess the integrity of the perceptual analysis/representational system in the STG/STS, it is that you can't tell whether deficits on that task are coming from perceptual problems or metalinguistic attentional (or working memory) problems.  It's interesting to see how various speechy tasks hang together or not--and stay tuned for my own foray into this area with evidence for both associations and dissociations consistent with HP--but honestly, if you want to unambiguously map the circuits and computations involved in speech recognition as it is used in the wild, dump syllable discrimination and stick to auditory comprehension.

Tuesday, March 14, 2017

RESEARCH FACULTY POSITIONS at the BCBL- Basque Center on Cognition Brain and Language (San Sebastián, Basque Country, Spain)

RESEARCH FACULTY POSITIONS at the BCBL- Basque Center on Cognition Brain and Language (San Sebastián, Basque Country, Spain) (Center of excellence Severo Ochoa)

The Basque Center on Cognition Brain and Language (San Sebastián, Basque Country, Spain) together with IKERBASQUE (Basque Foundation for Science) offer 3 permanent IKERBASQUE Research Professor positions in the following areas:

-Language acquisition
- Any area of Language processing and/or disorders with advanced experience in MRI
- Any area of Language processing and/or disorders with advanced experience in MEG

The BCBL Center (recently awarded the label of excellence Severo Ochoa) promotes a vibrant research environment without substantial teaching obligations. It provides access to the most advanced behavioral and neuroimaging techniques, including 3 Tesla MRI, a whole-head MEG system, four ERP labs, a NIRS lab, a baby lab including eyetracker, EEG and NIRS, two eyetracking labs, and several well-equipped behavioral labs.  There are excellent technical support staff and research personnel (PhD and postdoctoral students). The senior positions are permanent appointments. 

We are looking for cognitive neuroscientists or experimental psychologists with a background in psycholinguistics and/or neighboring cognitive neuroscience areas, and physicists and/or engineers with fMRI or MEG expertise. Individuals interested in undertaking research in the fields described in should apply through the BCBL web page ( The successful candidate will be working within the research lines of the BCBL whose main aim is to develop high-risk/high gain projects at the frontiers of Cognitive Neuroscience. We expect high readiness to work with strong engagement and creativity in an interdisciplinary and international environment.

Deadline June 30th

We encourage immediate applications as the selection process will be ongoing and the appointment may be made before the deadline.

Only senior researchers with a strong record of research experience will be considered. Women candidates are especially welcome.
To submit your application please follow this link: applying for Ikerbasque Research Professor 2017 and upload:
  1. Your curriculum vitae.
  2. A cover letter/statement describing your research interests (4000 characters maximum)
  3. The names of two referees who would be willing to write letters of recommendation

Applicants should be fluent in English. Knowledge of Spanish and/or Basque will be considered useful but is not compulsory.

For more information, please contact the Director of BCBL, Manuel Carreiras (