It is a commonly held belief that speech perception involves the recovery of segmental information -- that is, the speech stream is analyzed in such a way that individual phonemes are recovered. So a typical story is that we analyze the spectro-temporal features to recover phonemes which are put together to form syllables then phonological words, enabling lexical-semantic access. We've suggested, as have others, that maybe the syllable is a basic unit of analysis, while at the same time leaving open the possibility that we might also access segmental information. For example, as in this figure from Hickok & Poeppel 2007:
Or this overly simplified cartoon from Hickok 2009:
So here's the question, what exactly is the evidence that we access segmental information in perception? Do we even need phonemes for speech perception? Why?
Let me play devil's advocate and claim that we don't extract or represent phonemes at all in speech perception (production is a different story). We do it all with syllables.
Convince me that I'm wrong.
Hickok, G. & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8, 393-402
Hickok, G. (2009). The functional neuroanatomy of language. Physics of Life Reviews, 6, 121-143.