As detailed in a 2012
Talking Brains post, Greg and colleagues have proposed a model for speech production that aims to synthesize research from motor control, psycholinguistics, and neuroscience. This year, the inaugural issue of
Language, Cognition, and Neuroscience (a re-christening of
Language and Cognitive Processes) was guest edited by
Albert Costa and
F. Xavier Alario. It featured
an article by Greg outlining a descendent of this model, the Hierarchical State Feedback Control model (HSFC). This target article was accompanied by a number of commentaries, including
one by co-authored by the two of us and
Brenda Rapp, as well as a
response by Greg.
We (Matt and Adam) wanted to take advantage of the extra space afforded by Talking Brains to continue this conversation. The H in HSFC emphasizes the key role of hierarchical representations in Greg's proposal. In this post, we'd like to articulate why psycholinguists and neuroscientists have argued that in addition to such hierarchical representations, distributed/parallel encoding plays a critical role in language production.
To orient the discussion, consider two classical types of neurocognitive representational structures from vision:
1)
Hierarchical representations. In representations that have this type of structure, there is a mapping (a necessary relationship) between two sets of representations. Consider classic simple vs. complex cells (
Hubel & Wiesel, 1962). Under this proposal, simple cells preferentially respond to oriented bars in particular locations in the visual field. By integrating responses over many simple cells, complex cells respond to oriented bars across multiple locations. Critically, there is a precise mapping between these two levels of representation; the response properties of complex cells are defined by a function stated over the response properties of simple cells.
2)
Parallel, independent representations. In representations that have this type of structure, the relationship between the two sets of representations is not defined by a direct mapping which spells out one level in terms of the other; rather, they are independent dimensions of structure. These dimensions can be linked or bound together, but they need not necessarily co-occur. Consider Treisman and colleagues' classic Feature-Integration Theory, which claims that some dimensions of visual stimuli are initially processed independently and only later bound together. This proposal provides a ready account of illusory conjunctions (
Treisman & Schmidt, 1982). For example, if letter identity and color are coded independently, this can explain how a display with green
Xs and brown
Ts can give rise to the erroneous perception of a green
T; this percept would be unlikely if letter identity and color were encoded in a single representation. Critically, the two types of information must be encoded independently (but in parallel) for these illusory conjunctions to occur during the later process of binding.
The HSFC model emphasizes the role of hierarchical representations. There is abundant evidence that these play a role in speech production. With respect to speech motor control, many accounts adopt a syllable-sized, relatively coarse-grained specification of motor movements, which directly maps onto detailed information regarding the precise temporal and kinematic coordination involved in production. There is also evidence that there are multiple levels of segment-sized representations that specify different types of information. A classic distinction is between context-independent vs. position-specific aspects of sound structure. The context-independent representations encode information about the sounds (e.g., /t/ in
table and
stable), and these map to position-specific representations that spell out the details (e.g.,
table contains aspirated [t
h] and
stable contains unaspirated [t]). Evidence that these constitute distinct levels of representation includes data from individuals with acquired speech impairment (
Buchwald & Miozzo, 2011). While this is not directly specified in the current HSFC model, it is clearly consistent with the overarching account as noted in Greg's
response.
But what we'd like to emphasize is that parallel, independent representations also play a key role in language production. In particular, there's abundant reasons to believe that at certain levels of representation syllabic and segmental structure are not organized in a strict hierarchical fashion, but rather form parallel aspects of form representation. A number of results suggest that rather than syllables being defined as chunks of segments, syllable structure defines a frame; segments are then bound or linked to positions within this frame (see
Goldrick, in press, for review and discussion of other dimensions of phonological structure).
To make this contrast explicit, consider the syllable "cat." Under a strictly hierarchical theory, this syllable could be defined by a mapping from [kaet] to the component segment [k-Onset] [ae-Nucleus] [t-Coda]. Under a theory utilizing independent representations, there is a [Onset]-[Nucleus]-[Coda] syllable frame and, independently, three segments /k/, /ae/, /t/. The syllable is represented by the binding /k/-[Onset]; /ae/-[Nucleus]; /t/-[Coda].
The first form of evidence in favor of the independent representations perspectives comes from illusory conjunctions in production. Speech errors can result in the mis-ordering of segments. In the majority of these errors, the segments occur in the wrong syllable but the correct syllable position (e.g.,
bad cat misproduced as "bad
bat"). However, a substantial minority (more than 20% of errors in corpora of spontaneous speech;
Vousden, Brown, & Harley, 2000) result in error being produced in incorrect syllable positions (e.g.,
film misproduced as "flim"). Just as letter identity and color form independent, dissociable dimensions of visual representation, segment identity and syllable positions form dissociable dimensions of phonological representations in production.
Evidence from priming points to a similar conclusion. Colored object naming is facilitated by segmental overlap between the color and object name, even when the segments occur in different syllable positions (e.g.,
green fla
g;
Damian & Dumay, 2009). In addition, production of phrases made up of two nonsense words is facilitated when the two nonsense words have syllables with the same structure compared to nonwords that do not have matching structures -- even when there are no segments shared across the two syllables (
Sevald, Dell, & Cole, 1995). For example, repeating two nonwords that both start with CVC syllables (e.g., KEM TIL.FER) or CVCC syllables (KEMP TILF.NER) is faster than repeating nonwords that start with syllables with contrasting consonant-vowel patterns (e.g., KEM TILF.NER or KEMP TIL.FLER). This occurs in spite of the syllables sharing no segments (e.g., KEM and TIL).
Based on data such as these, psycholinguistic theories (e.g.,
Shattuck-Hufnagel, 1992) have proposed that syllables and segments are not related in a strictly hierarchical fashion, but rather form independent-yet-linked dimensions of sound structure. That's not to say that the links are purely arbitrary; only certain segments can be associated to particular syllable positions (e.g., in English, /ng/ can be associated to coda but not onset). But segments are not merely the "elaborated" form of syllabic chunks; they form independent entities.
While hierarchical representations are a critical part of speech production, it's important to acknowledge the critical role of non-hierarchical representation. Mirroring other domains of processing, both representational schemas serve critical functions in the neurocognitive mechanisms supporting speech.
References
Buchwald, A. & Miozzo, M. (2011). Finding levels of abstraction in speech production: Evidence from sound-production impairment. Psychological Science, 22, 1113-1119.
Damian, M. F., & Dumay, N. (2009). Exploring phonological encoding through repeated segments. Language and Cognitive Processes, 24, 685-712.
Goldrick, M. (in press). Phonological processing: The retrieval and encoding of word form information in speech production. In M. Goldrick, V. Ferreira, & M. Miozzo (Eds.) The Oxford handbook of language production. Oxford: Oxford University Press.
Hickok, G. (2014a). The architecture of speech production and the role of the phoneme in speech processing. Language, Cognition and Neuroscience, 29, 2-20.
Hickok, G. (2014b). Towards an integrated psycholinguistic, neurolinguistic, sensorimotor framework for speech production. Language, Cognition and Neuroscience, 29, 52-59.
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of physiology,160(1), 106-154.
Rapp, B., Buchwald, A., & Goldrick, M. (2014). Integrating accounts of speech production: The devil is in the representational details. Language, Cognition and Neuroscience, 29, 24-27.
Sevald, C. A., Dell, G. S., & Cole, J. S. (1995). Syllable structure in speech production: Are syllables chunks or schemas? Journal of Memory and Language, 34, 807-820.
Shattuck-Hufnagel, S. (1992). The role of word structure in segmental serial ordering. Cognition, 42, 213-259.
Treisman, A., & Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107-141.
Vousden, J. I., Brown, G. D., & Harley, T. A. (2000). Serial control of phonology in speech production: A hierarchical model. Cognitive psychology,41, 101-175.