The dual stream model of speech processing--originally proposed by Carl Wernicke in 1874 and modernized by me and David Poeppel in a series of three publications in 2000, 2004, and 2007--has become, it seems, the standard model of the field. It holds that speech processing is achieved along two task-dependent streams: a dorsal "how" stream important for speech production and a ventral "what" stream important for speech recognition.
Of course there is disagreement on the details, which is a healthy thing. The most prominent disagreement comes in the form of an alternative formulation by Rauschecker and Scott in a 2009 publication. I commented on the problems with their proposal in a previous post. Here I want to highlight and discuss the main functional anatomic disagreement, which concerns the branching point of the two streams. Rauschecker's monkey work suggests an early (A1) divergence, as this figure from R&S indicates:
R&S, sticking close to the monkey anatomy, translate that into a human architecture with the same branch point, meaning that everything ~caudal to A1 is dorsal and everything ~rostral to A1 is ventral.
In contrast, we've argued that the branch point is fairly deep into the system, in the vicinity of the pSTS (yellow phonological network in figure) or caudal parabelt in monkey auditory cortex anatomy terms:
The evidence for this comes from a variety of sources (neuropsych, imaging) that we have recited again and again in various publications. I won't rehash it here. For some recent evidence for an "auditory phonological area" that is consistent with the H&P proposal, see this Twitter thread.
An advantage of the R&S view is that it can be viewed as more parsimonious or conservative in the evolutionary sense, sticking closer to the anatomy as it is understood in the monkey. I think this is a reasonable tack. The question, then, is whether there is empirical justification for a different architecture in the human. My long-time position on this is that the empirical evidence is overwhelming and therefor justifies a different architecture in humans compared to macaques.
but here I want to step back and consider some functional-behavioral arguments that I think reconcile to some extent the R&S and H&P viewpoints and show that they are not all that different after all.
The basic insight is that the auditory-motor repertoire of monkeys and humans is dramatically different. There is general agreement that the vocal-learning capacities of monkeys is limited whereas humans are arguably the most prodigious vocal learners in the animal kingdom. What this means is that the auditory-motor repertoire of monkeys is going to be limited to relatively simple behaviors like orienting to sound and not to behaviors where the perceived sound must be reproduced by the monkey. Perhaps it is no surprise that the monkey dorsal stream, in Rauschecker's hands (cf, Middlebrooks), is spatial perception oriented.
Speech is different. The phonological form of a word must be used both as a means to access the meaning of the word, a ventral stream "what" function, and as a target for a motor speech action, a dorsal stream "how" function. Now, look back at the first figure in this post and play natural selection for a moment. If you were going to evolve a network that could represent higher-level information about word forms AND use them in both ventral and dorsal streams, where would you put such a network? A good candidate is the lighter shaded area just ventrolateral to A1 and extending posterior from there. This is the yellow shaded box/area in the H&P figure, the "Mid-post STS", which in our model is part of both the ventral and dorsal streams. R&S assign this zone as part of the dorsal stream (red in their human architecture figure) but they acknowledge some interaction between that posterior zone and the more anterior portion of their ventral stream, as the arrow in their figure indicates.
What I'd like to communicate with these points is that (1) the H&P model architecture is perfectly consistent the monkey architecture assuming some reasonable evolutionary expansions, (2) R&S assume a similar expansion into that same zone in humans, meaning that (3) the two models are not really that different at this level: H&P refer to the mid-posterior zone as part of both streams whereas R&S call it part of the dorsal stream that nonetheless interacts with both streams.
Conclusion: perhaps there is more agreement about the details of the dual stream architecture than the pictures and claims suggest.