Monday, August 27, 2012

Liberman and the Perception of the Speech Code

Alvin Liberman is at the center of a scientific divide.  On the one hand, his work on the motor theory of speech perception has been elevated virtually to the status of gospel among those researchers who promote the role of the motor system in perception.  On the other hand are a new generation of speech scientists who believe that speech perception is the purview of the auditory system.  For these researchers Liberman has a near villainous status, or at least represents the personification of a roadblock to progress in speech research.  Full disclosure: I lean towards the latter.

So I decided to go back and read some of Liberman's original work.  I highly recommend reading the older literature in your research area -- there's a lot of useful information! -- and it is particularly important whenever decades-old results or theories are gratuitously cited in modern work.  You'll often be surprised and almost always you will learn something important.  With respect to Liberman et al.'s work, I have to admit, it is fairly impressive.  Along with a group at MIT that included Ken Stevens, Liberman and colleagues virtually defined the field of speech perception with their pioneering work.  It was technically sophisticated, theoretically rich, and generated a massive amount of data that defined the problems we are still struggling with today.

On interesting tidbit of information was Liberman's idea that the somatosensory information was what ultimately drove the perception of phonemes; the motor system was used simply as a means to access this information.  Here a quote:

...the articulatory movements and their sensory effects mediate between the acoustic stimulus and the event we call perception. In its extreme and old-fashioned form, this view says that we overtly mimic the incoming speech sounds and then respond to the proprioceptive and tactile stimuli that are produced by our own articulatory movements.  For a variety of reasons such an extreme position is wholly untenable, … we must assume that the process is somehow short-circuited – that is, that the reference to the articulatory movements and their sensory consequences must somehow occur in the brain without getting out into the periphery.” (Liberman, 1957) p. 122

In reference to probably the most important of Liberman's early papers, The Perception of the Speech Code, I noticed an interesting juxtaposition.  One is the starting assumption that their analysis should be restricted to the level of the phoneme.  The article starts,
Our aim is to identify some of the conditions that underlie the perception of speech.  We will not consider the whole process, but only the part that lies between the caustic stream and a level of perception corresponding roughly to the phoneme. (p. 431)
The other is the observation that the group is famous for, the parallel transmission of information about phonemes in a syllable, noted here in the context of a discussion of perceptual experiments involving synthesized versions of the phoneme /d/ in the context of a following vowel:
If we cut progressively into the syllable from the right-hand end, we hear /d/ plus a vowel, or a nonspeech sound; at no point will we hear only /d/. This is so because the formant transition is, at every instant, providing information about two phonemes, the consonant and the vowel – that is, the phonemes are being transmitted in parallel. (p. 436)
And a few pages further on, they conclude:
This parallel delivery of information produces at the acoustic level the merging of influences we have already referred to and yields irreducible acoustic segments of approximately syllabic dimensions. (p. 441). 
And one more towards the end of the paper:
To find acoustic segments that are in any reasonably simple sense invariant with linguistic (and perceptual) segments ... one must go to the syllable level or higher. (p. 451)
This strikes me as the one point where Liberman's work went awry.  By committing theoretically to the notion that individual segments must be extracted and represented in the speech perception process, they were in no position to recognized what their data were clearly telling them: that the acoustic signal is reflecting larger chunks of information -- that  is, something closer to the syllable.  This observation is nothing new, of course.  Others have pointed out the problems with the phoneme as the unit of analysis.  But it is interesting to reconsider the data in their own light, rather than in the shadow of the phoneme-as-a-perceptual-unit dogma.  It was a perfectly reasonable assumption, that phonemes should be relevant not only for phonological theory (i.e., production) but also for perception.  Unfortunately it unnecessarily complicated the perceptual picture.  I wonder how Liberman's work might have progressed if he had made different theoretical assumptions.  Maybe we'd understand how speech is perceived by now.

Liberman, A. M. (1957). Some results of research on speech perception. Journal of the Acoustical Society of America, 29, 117-123.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychol Rev, 74, 431-461.