Thursday, January 5, 2012

Computational neuroanatomy of speech production

For your Happy New Year reading enjoyment, let me point you to my just (online) published synthesis of computational, psycholinguistic, and neuroanatomic research on speech production: Hickok, G. (2012). Computational Neuroanatomy of Speech Production, Nature Reviews Neuroscience.  The aim was to shatter barriers between the motor control folks, the psycholinguists, and the neuroscience oriented researchers studying speech production.  This integration has some interesting consequences (in my view).  Here are a few:

1. Speech motor control is hierarchically organized (no big surprise) with an auditory-(pre)motor circuit representing a relatively higher level and a somatosensory-motor circuit a relatively lower level.

2. The auditory grounded circuit primarily deals in units the size of syllables whereas what we normally think of as segmental units (~phonemes) are processed primarily in the lower-level somatosensory based circuit.  Yes, I'm arguing that "phonological representation" is distributed over auditory and somatosensory cortex.

3. Phonological encoding, in the sense of typical two-stage models of speech production is achieved in the context of a state feedback control circuit (from the motor control tradition).

4. Efference copy signals as they are currently conceptualized in the motor control literature do not exist (let's see what kind of push back I get on this one!).  That is, the motor controller does not issue a copy of the command that it has executed.  Rather, motor to sensory feedback is part of the motor planning process from the start.  In other words, in my view, the "efference copy" is an iterative feedback loop that enables sensory systems to be a part of the programming of the movement rather than just evaluating the outcome of movement commands. This conceptualization integrates the notion of motor planning, efference copies, forward prediction, and error correction into one mechanism.  In addition, this computational architecture solves the problem of how both internal and external feedback monitoring can be achieved by the same network even though the timing of the two feedback sources differ.  I present a simply simulation to demonstrate the feasibility of these assumptions.

5. Forward predictions are instantiated computationally as inhibitory inputs to sensory systems.

6. Conduction aphasia and apraxia of speech involve disruption to two different components of the same hierarchical level of state feedback control, the relatively higher level auditory-pre-motor circuit.

7. Sensory representations are central to the motor planning process and explain the tight interaction between sensory and motor speech systems.  It is a sensory theory of speech production in a sense as opposed to a motor-oriented theory of speech perception.

I would love to get your thoughts on this paper.  There's lots in here to discuss/argue about and it will be fun to debate some of the data and/or theoretical claims.

1 comment:

Willy Serniclaes said...

I’ve greatly appreciated your synthesis (thanks to Anne-Lise Giraud for linking me to your blog). One point I find especially appealing is that your computational model solves the difference in timing between internal and external feedback monitoring. I’ve proposed a solution to a somewhat similar problem, the one of differences in spatial representation of speech sounds between the auditory and motor systems. This problem originates from the fact that frontward movements in the vocal tract have opposite acoustic consequences for vowels and consonants. Compare a vowel sequence such as /iu/ (‘you’) with a consonant sequence such as /bʌg/ (‘bug’). Both are produced with a backward motor change, but they have opposite acoustic effects: a downward frequency change for /iu/ vs. upward change for /bʌg/. As a consequence, the perception of motor changes is distorted in an acoustic-auditory representation. To interpret a downward frequency change as a backward movement will be right for vowels but false for consonants. However, this problem can be solved if there is an inversion of the perceptual representation at some further processing stage. Evidence in support of this conjecture has been gained with behavioral data in a series of experiments with French-like /i,y,u/ and /b,d,g/ synthetic sounds [1,2]. We’ve found that perceptual boundaries between the vowels did not match those between the stops in an acoustic representation. However, there was a close match between the vowel and stop boundaries after a rotation of the acoustic representation. This shows that there is a mathematical solution (the rotation of the acoustic space) to the lack of common representation of the spatial relationships between vowels and of those between consonants. But it remains to be proven that this solution is indeed used by the brain. We are now currently seeking ways to test this hypothesis.

[1] first publication of data with adults:
Serniclaes, W. & Salinas, J. (2011). Perception of vowels and consonants: From acoustic diversity to cognitive isotropy. Faits de Langue, 37, 207-224.

[2] under way: a second publication with both adults and children data and also psychoacoustic data (sinewaves analogues of speech sounds, first heard non speech whistles, then as speech sounds using the same paradigm as e.g. Dufor et al., 2007; 2009).

Dufor, O., Serniclaes, W., Sprenger-Charolles, L., & Démonet, J.-F. (2009). Left pre-motor cortex and allophonic speech perception in dyslexia: A PET study. NeuroImage, 46, 241–248.