Something has been bothering me lately with respect to phonological theory.  It is assumed that phonological representations are abstract entities not tied to any sensory or motor systems.  Why then are phonological representations defined in terms of motor articulatory features?  Does this bother anyone?
16 comments:
Hi Greg:
It looks like a question for your co-blogger, David Poeppel, doesn’t it?
I agree it’s bothering if we think of speech as of acoustic phenomenon only. However, if we consider speech perception as bimodal (i.e., in a sense amodal) because it - in typical case - integrates speech sounds and lipreading, then the definition in terms of motor articulatory features seems more reasonable.
Vilem
I'm thinking about it more from a purely formal (generative phonology) perspective where the representations are considered abstract and amodel, but the representational space is tied to the motor system.
Regarding speech perception I don't think lipreading phenomena makes motor articulatory features any more reasonable. I believe, for empirical reasons, that visual speech has the effect it does because of cross-sensory integration, not because of sensory-motor integration.
Cross-sensory integration, of course.
Perhaps I’m ignorant of the subject but I don’t think formal linguistics considers something like motor representation. It is interested in competence rather than performance and, as far as I understand it, it does not support any sort of motor theory. For example distinctive features (which I took to be your "motor articulatory features"), though apparently abstracted from speech production rather than perception, are pure abstraction rather than motor representation and so are phonemes (segments).
That is precisely the thing that bugs me. Distinctive features in phonology, which are assumed to be abstract, are defined in terms of how speech is produced. Why is that? Why aren't formal phonological representations be defined in terms of acoustic features?
Because linguistics has ever been more focused on production than perception.
One more stupid note: Imagine a formal linguistic theory of chinchilla's perception. Distinctive features would hardly play a role in it.
More precisely: SOME distinctive features could be a part of the chinchilla's theory of speech sound perception. But no THE distinctive features we are talking about.
First, I'd like to point out that modern phonological theory is quite different from what it was in 1968. There is a range of opinions on how "grounded" or modal, phonological features are, e.g.,:
- Hale & Reiss (2008) which argues for complete abstraction
- Blevins (2004), which is similar, but diminishes much of the role of synchronic phonology, so it doesn't matter too much
- Hayes, Kirchner & Steriade (2004), which argue for completely grounded, modal phonology
- Motor theory, wherein the features are grounded but not grounded to a domain that seems to make sense to some of us.
- Dresher (2009), which I think gets at the part-B of your question as the features are abstract, but they are only tied to modal feature names for convenience, as I understand it.
- Johnson (1997), Hume (2004) and Mielke (2004) where it's suggested that traditional features don't really do the trick but that there are natural classes of phones based on phonetic similarity; this can be construed as undermining your assumption.
Second, if you're asking the question about motor vs. acoustic, there's a long long literature on that. It should depend on which best captures the way sounds pattern together - the acoustic v motor feature question never really interested me much, so I'm not sure the answer off the top of my head.
Third, I agree that you should definitely ask Dr. Poeppel for a spirited defense of substance free phonology - it's part of the P, Idsardi & van Wassenheove (2008) framework and I personally think that's one of the flaws.
Having said that, I'm not sure I understand the question, on a certain level. How else should we refer to the natural class of vowels with a F1 btw 100-300, or the natural class of consonants made with th tongue somewhere around the alveolar ridge, for example?
The fact is that phones pattern together in their behavior based on myriad factors, foremost of which are their acoustic and articulatory properties. These classes need to be labeled somehow even though you might postulate that the phonological system doesn't care at the level of formalizing generalization. So, I can't tell if your question is, how modal are features, or, what's the socio-historical reason for using a particular set of labels for the features that have been postulated?
Thanks Marc for the update on the state of various ideas in phonological theory. It is much appreciated.
Here's really what I'm getting at: Phonology, as it is studied by linguists, serves as the foundation for many psycho- and neurolinguists in their models of speech production and speech perception. I think I am accurate in my assumption that the traditional feature-based view is the most commonly "used" variant of phonological theory as it is applied to processing/neural issues which is why I targeted this in my post. (Maybe I'm wrong here?) In any case, it is commonly assumed that the same abstract phonological representations are used both for production and for comprehension. I have questioned this assumption in the past, as have others. Now I'm wondering if the fact that phonological theory is more often grounded in motor space means that it is more a theory of speech production (or sensory-motor integration, the dorsal stream) than a theory of speech perception (or sensory-conceptual integration, the ventral stream).
So let me ask you the question directly Marc: do you believe in an abstract phonology that is used both for speech production and speech comprehension?
I thought that phonological representations are abstract, yes, but they still are linked to or represent fairly specific sensory or motor criteria. They are abstract only in the sense that numerous variations can be ignored in qualifying some speech sound as a particular phonological representation.
Defining phonemes in terms of articulatory features probably had more to do historically with guiding missionaries and other travelers that needed to learn or speak foreign languages.
Hi Greg,
Yes, it's definitely the case that neuroscientists use an abstractionist model; one that isn't really used by most linguists anymore, I might add. The dominant paradigm within phonology right now is OT, as you probably know, but some of the details might seem quite foreign/odd to neuroscientists. This includes the idea of richness of the base, which stipulates, among other things, that underlying representations are fully specified and generally the same as surface representations. This contradicts some of Lahiri's early work on underspecification, for example. There's a lot more to think through on this topic but basically, the OT approach relies on essentially a neural network framework, which I think isn't particularly compatible with the (double) disassociation paradigms used in neuro experiments.
To move on to your direct question:
It seems undeniable that representations become more "abstract" as an acoustic signal moves through the processing stream. The representation in the cochlea is pretty much the acoustic signal, the representation in the brain stem reflects some amount of important "abstraction" of certain information but is still fairly veridical. The representation in the primary auditory cortex certainly seems to reflect the extraction of acoustic features that are abstractions of the signal. It only seem reasonable that this trend continues afferently.
It also seem undeniable that the perception and production systems end up very tightly linked early on in development.
What I don't know very much about is the nature of the motor system wrt to the abstraction of gestural features over the motor routines for related phones. But let's assume that it mirrors the perceptual system.
Given that, it would seem pretty uncontroversial to suggest that abstract perceptual representations are connected to an abstract production representations.
What is up in the air, and I think the real essence of your question, however, is:
What is the nature of these abstractions? The evidence for a purely feature-based abstraction of the perceptual representation is far weaker than one would expect to have found at this juncture. Paired with the evidence that the representation is not, in fact, exclusively featural, suggests that the "abstraction" referred to above is far more complex than SPE features, for example.
And even if we assume features are appropriate, can we point to a group of neurons and say, this is the [+high]-feature which joins the perceptual representation of certain normalized F1 formants with the production representation for the shared motor routine for [i], /I/ and [u]? This question strikes me as analogous to the "grandmother neuron" debate. Is there a group of neurons that are completely dissociable from all the modal representations of grandma/[+high]?
I would love to somehow gain some insight into these questions. I think that would really shed a lot of light on the neural basis of speech production and perception.
Thanks Marc for the thoughtful response. I agree with everything you said. But there's a component missing in your discussion.
MARC: "It seems undeniable that representations become more "abstract" as an acoustic signal moves through the processing stream."
Yes. But, and here is the key, *which* processing stream? There is a good evidence for two different stream, one linking auditory with motor and the other linking auditory with conceptual.
MARC: It also seems undeniable that the perception and production systems end up very tightly linked early on in development. Given that, it would seem pretty uncontroversial to suggest that abstract perceptual representations are connected to an abstract production representations."
Yes, for sure. But this is only half of what we have to do with "phonological" information.
So let's take your argument that processing information involves abstraction. And let's put it with my point that there are two processing streams involved. Then here's the question: Is the abstraction(s) involved in one stream (auditory-motor) the same as the abstraction(s) involved in the other (auditory-conceptual)? And here's my answer: No. Further, the abstraction(s) captured by (most) phonological theories are relevant primarily to only one stream, the auditory-motor stream.
Thanks for clearing up that dimension of the question. I agonized briefly over what words to use when I chose "afferently" and "through the processing stream" for that reason, but it didn't occur to me that that was the crux of the matter.
I'm a bit surprised by your final point, though. For some (many?) phonologists, features are all about the auditory-conceptual system, if related to processing at all. The motor integration dimension is often acknowledged just as H&P suggest - facilitating the decomposition of the acoustic signal into features. I don't remember the details of Steven (2002) on that specific issue, but that's seems to be the basis of many models.
Stevens KN. Toward a model for lexical access based on acoustic landmarks and distinctive features. J Acoust Soc Am. 2002 Apr;111(4):1872-91.
Yeah, Stevens uses acoustic landmarks to identify articulator free features like [vowel] and [consonant] and then uses these features plus the context to derive articulator-bound features like [lips], [nasal].... Both of these representations are what feeds into lexical access. I'm suggesting that we do away with articulator-bound features in the auditory-conceptual mapping. Put differently, phonologists have been studying how people produce speech (i.e., make their mouths move to reproduce sounds of their language) and have developed some nice theories, many of which, not surprisingly, make use of features associated with speech articulators. These theories are then assumed to apply to all things "phonological" including how we perceive and understanding speech. This is an assumption that I think we need to reconsider; i.e., I think the assumption is wrong.
I’ve been enjoying the excellent discussion of Greg and Marc and didn’t want to interfere. But I can’t help adding a few notes: The last Greg’s comment is the core of the problem. For example, Halle (1983) says:
“My discussion below focuses exclusively on speech production … This … is not due to feeling that perception is any less important than production but rather because … we have somewhat better grasp of the issues in the articulatory domain …” (Halle 2002: 110).
As a matter of fact, Halle’s distinctive features (as well as those of the 1968 SPE and many other) should be considered neither articulatory nor perceptual but simply abstract. Moreover, it’s true (and easily understandable) that phoneticians/phonologists are concerned mostly (though not exclusively) with what corresponds to the dorsal stream. When I said above that linguists’ve been focusing on production rather that perception, I meant precisely these two things.
There is a linguistic analogy of Marr’s levels in vision: (1) computational / formal, (2) algorithmic / functional (psychological processing), (3) implementational / neural instantiation (see Jackendoff 2002). And there is an interesting application of it by Poeppel, Idsardi & van Wassenhove’s (2008): (1) distinctive features, (2) analysis by synthesis, (3) multi-time resolution. But. Referring to Stevens (2002) among others, they say:
“In our view, words are represented in the mind/brain as a series of segments each of which is a bundle of distinctive features that indicate the articulatory configuration underlying the phonological segment. (…) The fact that the elements of phonological organization can be interpreted as articulatory gestures with distinct acoustic consequences suggests a tight and efficient architectural organization of the speech system in which speech production and perception are intimately connected through the unifying concept of distinctive features.” (p. 1072)
Where we are now? At the formal level? Or the psychological or neural level? Is “features that indicate the articulatory configuration” = articulatory features? Does it mean that words are REPRESENTED in mind/brain as bundles of articulatory features or just can be INTERPRETED so? And at which level?
Finally, I agree with Poeppel & Embick 2005 that neuroscience of language (as well as psycholinguistics) should develop around the existing abstractions of formal linguistics (and provide a feedback to them) rather than develop formal theory of its own. The critical point is the choice of the formal theory and its interpretation. The 1968 SPE (and its mutations) is definitely better for the purpose than the various current “Against sth” theories (sth = e.g. markedness, universal grammar, phoneme). OT may be even better, but I’d like to know why. By the way the best of OT is the Nose OT by Ollaha and Ettlinger (no, I don’t mean it, I’m not “Against OT”!).
David asked me to post something on this thread last week, but I've been too busy. Very briefly (and I don't mean to speak with any authority for David and Virginie) we see the abstract features as a both a perspicuous coding and as the glue between auditory and articulatory representations (what I take to be the Jakobson-Fant-Halle position). As an efficient code it is also used to store representations in long-term memory (another analogy would be that they are like macros in programming languages). Now is this "amodal" or "multi-modal" (or something else)? I don't think this question matters much (think James's cash-value) because I can't see what's at stake.
Now, there is a way that Generative Phonology qua computational system makes an amodal commitment: in the changes that can take place (e.g. rules). So the idea there is that feature synchronization, concatenation, insertion, deletion, etc. do not depend on the content of the features, just their (abstract) status as features. Hence the general format of rules (A -> B / C _ D ) in SPE. I remain somewhat skeptical of this view, especially in light of probabilistic interpretations of grammars. In a Bayesian formulation we can have UG as the prior probabilities on many aspects of the grammar, so I would be happy to say that word-final devoicing is a priori more probable than word-final voicing (to pick one of the Blevins debates). (I know this is a weak position on the issue.)
Finally, Morris Halle's views on this issue have changed over the years, and are recounted by him in the introduciton to _From Memory to Speech and Back_ (Mouton 2002). Briefly, his argument is that whenever he revisted the feature definitions he found that when he went with the articulatory definitions the analyses got better.
Post a Comment