A now-common finding in the functional imaging literature on speech perception is that Broca's area is active during the perception of speech. The activation magnitude is sometimes not as strong or consistent as one finds in auditory cortex, but it is there and so requires some explanation. There are a few possibilities. (I'm talking about Broca's area as if it were one functional region, which it isn't, but we'll gloss over that for now.)
1. Broca's area drives the analysis of speech sounds (i.e., the motor theory of speech perception is correct).
2. Broca's area supports/modulates the analysis of speech sounds via some predictive coding process (articulation driven forward model or analysis by synthesis).
3. Broca's area activation is epiphenomenal -- it simply reflects spreading activation in association networks and serves no function in speech perception.
4. Broca's area activity reflects a higher-order process (e.g., cognitive control) that is involved in say, response selection.
We can quickly rule out #1 for reasons that have been articulated previously, e.g., here.
Regarding the other possibilities, I think the question is still open for debate. As additional fuel for this debate, consider the following recently published (epub-online) finding reported by Vaden et al.
In an fMRI study, listeners heard sets of words that varied in terms of phonotactic probability, which is a measure argued to reflect sublexical properties of words (density was also manipulated but didn't show a robust effect). The task was to monitor for occasional non-words embedded in the word sets -- these trials were excluded from analysis. The goal was to try to identify neural regions that are sensitive to sublexical properties of words.
One might have expected to find such effects in relatively early stages of processing in auditory cortex, based on the standard hierarchical assumption that word recognition first analyses segmental-level information which it then uses to access the appropriate lexical-phonological codes, generally acknowledged to be coded/processed in the STS. However, Vaden et al. found no effects in auditory cortex, indeed in the entire temporal lobe. Instead, activation in Broca's area (~pars opercularis) was modulated as a function of phonotactic frequency: word sets comprised of higher phonotactic frequency words yielded greater activity in Broca's area. What's interesting about this is subjects have no conscious idea that the words vary according to frequency of the sounds sequences that comprise them, yet Broca's area sure does.
So what's up? Does this mean that the motor theory is right? Is Broca's area critically involved in the early analysis of speech information? Nope. (Refer again to the reasons why #1 above can't be right.) It must be something else.
Cognitive control? Subjects were trying to find occasional non-words. Maybe phonotactic modulations vary cognitive decision load... This is possible but not likely: no effect in Broca's area was found for neighborhood density, which is argued specifically to induce competition and therefore should affect decision processes. Further, cognitive control effects tend to involve more anterior regions.
Forward prediction? Yes, possibly. High phonotactic frequency items are associated with more predictability. Maybe Broca's area gets excited when it encounters a predictable pattern.
Epiphenominal? Yes, possibly. High phonotactic frequency items likely have stronger associations between auditory and motor representations of speech; the stronger the association, the more spreading activation one sees.
Here's what I really think is going on. Basically, the strength of the association between auditory and motor representations of a phonemic sequence is what's driving the correlation, as in the epiphenominal account. Why do these associations exist? Because the goal of speech production is to reproduce a particular *sound*. To achieve this production task we need to relate sound and movement. Those auditory-motor codes that are more frequent are more strongly associated leading to more activation. HOWEVER, even though the underlying explanation for this effect has more to do with speech production than speech perception, it may be possible for the speech system to take advantage of the situation and use this information to augment perception, in a forward predictive manner.
Returning to the question of hierarchical models of word recognition, it is interesting that no such effects of phonotactic probability showed up in auditory regions. This is consistent with the view that speech recognition does not necessarily involve access to segmental level units. But it could also be that phonotactic probability isn't a good metric of segmental level processing.
Vaden, K., Piquado, T., & Hickok, G. (2011). Sublexical Properties of Spoken Words Modulate Activity in Broca's Area but Not Superior Temporal Cortex: Implications for Models of Speech Recognition Journal of Cognitive Neuroscience, 1-10 DOI: 10.1162/jocn.2011.21620