Tuesday, March 24, 2009

What is speech perception?

I have to admit I'm starting to get a bit depressed about the field of speech perception.

Most experiments on "speech perception" ask participants to discriminate pairs of syllables or identify which sound they heard. The recent paper by D'Ausilio et al. used such a measure and in their response to my commentary some questions were raised about my "task specific effect" comment (I decided not to address them for lack of energy, but I'll tell you why I don't buy their arguments if anyone wants to know). A recent thoughtful comment here on Talking Brains by Marc Sato included a quote that sparked sufficient energy to motivate a few words on my part though. The quote was from a recent paper (this is not a dig against anything Marc said, it's just that this quote got me thinking):
speech perception is best conceptualized as an interactive neural process involving reciprocal connections between sensory and motor areas whose connection strengths vary as a function of the perceptual task and the external environment.

I don't know what other folks are studying when they study speech perception, but to me speech perception is best conceptualized as that process that allows a listener to access a lexical concept (~word meaning) from a speech signal. This is what "speech perception" does in the real world. It is one step in the conversion from variations in air pressure into meaning. I'm pretty sure the capacity for "speech perception" didn't evolve or develop to allow us to tell an experimenter whether we heard a /ba/ or a /pa/. In fact, the next time you have the pleasure of talking to a speech scientist who regularly employs such methods, pause after a sentence you speak and ask if in the last sentence you uttered the syllable /ba/ or not. S/he will have no idea; we don't perceive phonemes, we perceive word meanings. For the most part, the ability to make conscious decisions about phonemes is a useless ability in the context of auditory speech processing, one that is probably only available to literate individuals by the way (I can dig up some refs if anyone is interested). If you are interested in studying the ability to make judgments about speech sounds, that is perfectly fine; after all it appears to be highly relevant to reading -- an important issue. But don't assume that you are studying anything that is necessarily relevant to what happens in the real world of auditory speech processing.

Let me really stick my neck out and say this: if you are going to use a task that requires listeners to make judgments about speech sounds (syllable discrimination or identification), then in order to make the claim that you are studying anything relevant to how speech is actually processed in the real world, you better have some empirical data to back it up; i.e., it better hold for comprehension and not just metalinguistic judgments.


yaxu said...

Fascinating stuff, this has made this area much clearer for me, thank you.

I understand that hearing babies exposed to sign language and not speech babble with their hands but not with their mouths [1]. I'm interested in how you see this relates to the "sensory theory of speech production". From my naive point of view it would seem to support it, because it suggests that speech perception comes first and influences what kind of motor production develops.

[1] http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&uid=15110725&cmd=showdetailview

Jonas said...


Jonas said...

More elaborate reply: Thanks, Greg, for this. I think you and David did a good job, also, in introducing the “speech recognition” vs “speech perception” distinction to that end. Frank and I tried, less vocal probably, to make the same point in our TICS paper (i.e., that the tasks often employed in speech studies are somewhat artificial and detract from what shoudl be th egoal of speech perception / recognition / comprehension studies).

Matt Davis said...

But there is good research which manipulates the phonetic quality of speech in experimental contexts that are more like natural comprehension.

For instance, Andruski, Blumstein and Burton (1994) show that VOT manipulations for word-initial segments (e.g. making tokens of king that are more like ging) reduce the magnitude of cross-modal semantic priming of associated words (such as queen). Furthermore, Aydelott Utman, Blumstein and Sullivan (2001) show that Broca's aphasics are more severely affected by just this sort of subphonetic variation.

This seems like just the evidence that you ask for concerning the impact of frontal lesions on speech recognition, and very consistent with results showing impaired speech perception in less naturalistic tasks.

Greg Hickok said...

Hi Matt. First let me be clear that I'm NOT arguing we shouldn't or can't use less naturalistic tasks. I'm just saying that you can't assume that the effects you see on your (one's) task result from operations at the same level of "phonological processing" as those involved in the behavior we are ultimately interested in understanding, which for me is auditory comprehension.

Take syllable discrimination. Deficits on this task could result from peripheral hearing loss, accessing phonological forms, maintaining such information in memory (long enough to make a comparison between the two), directing conscious attention to phoneme segments, doing the comparison between the syllables, response selection, remembering the task, motor control of button pressing, etc., etc. In other words, just because you are using a speech perception task, doesn't mean that your measure is reflecting "speech perception" -- it could be reflecting some other aspect of the task. This is true b.t.w. of any task including "comprehension" tasks. If a patient makes an error in pointing to the correct picture on a word-pic matching task, it could be because they are agnosic, or have a response selection problem. So we use a non-speech control task where they match pics-to-pics to help hone in on whether it is a deficit at the level we are interested in.

Now, re: the Blumstein studies. I like this line of work. But the question we need to ask is whether these priming effects are measures that are tapping a relevant level of processing. I haven't thought carefully about this work, but you'd want to ask questions like, does cross-modal priming measure some aspect of the normal speech recognition process? Is the Broca's sensitivity to these acoustic manipulations indicate that their "phonological processing" system has been damaged, or is it affecting their ability to generate forward models that can facilitate "phonological processing" in the auditory system.

Again, just because one performs a "phonological" manipulation doesn't mean that we automatically gain direct access to the workings of the "phonological" system.

What we need to do is decide exactly what we want to understand. If it is "speech recognition" -- ~processing phonological information for lexical access -- then pick a task that represents that behavior, e.g., auditory comprehension. If this is not feasible, then fine, use another task, but then do the extra experiments to make sure your measurements relate to the process you are trying to understand. Simple task changes can profoundly affect the neural circuits that get involved. Consider smiling. If I recall correctly, the ability to voluntarily generate a smile can be impaired while the ability to spontaneously smile can be preserved. This parallels the dissociations we see in the voluntary attention to speech sounds (or making lexical decisions, or grammaticality judgments, or whatever) compared to the spontaneous use of such information to comprehend spoken language.

Anonymous said...

Dear Greg,

I see that now we agree on almost everything!

1) We think that the addition of noise to speech is important to make the task difficult enough (now we know from a new experiment that this is definitely true and we are trying to understand whether it is a problem of ceiling effect or if it reflects other mechanisms).

2) We both think that the motor involvement may reflect an attentional-like mechanism (never heard the so called "premotor theory of attention"?).

3) We agree now on the fact that 80% of misunderstanding is a relevant deficit and not the proof that the lesion of Broca’s area has nothing to do with speech perception.

If you remember, however, the fact that Broca’s area plays a marginal role in phoneme discrimination is exactly the title of our contribution to the special issue on mirror neurons that you are editing now. I would say ‘surprisingly’ because I completely ignored your so deep anti-mirror position when I accepted to contribute to this issue.

However, here the weather is sunny, we are in the most beautiful country of the world, we are very happy of how things are going on, and now we are even happier because you agree with us!
Unfortunately, we don't have enough time now to go in depth in this debate (we are supposed to stay in the lab to make experiments to answer to yours - and ours - doubts) and, as you probably know, I am quite refractory to formulate “highly theoretical theories”. I have the impression that sometimes you make some confusion between what I write and what they write some much more intelligent and ‘multidisciplinary’ colleagues of mine.

I confirm, however, that my intention of the last five years was to investigate the CAUSAL role of the motor/premotor system in perception. Now we have interesting results, such as this on CB and (please, prepare a lot of space on your blog!) another paper coming soon, showing that frontal aphasics do have problems in pragmatically representing others’ actions. However, also if I had found the opposite of what I am finding, it would have been for me a precious information. My goal is to understand how the brain works by keeping a constant distinction between data and speculations.

What I definitely disagree is the fact that your posts on the blog are so evident and visible, while the comments from others can be seen only by clicking on a small link. But this is a secondary issue.
Have a nice day (I hope sunny as well, in California, surrounded by palm trees)!
Friendly yours,


Anonymous said...

Hi Greg,


I understand your criticisms regarding the fact that speech perception and speech recognition doubly dissociate and the need to test the ability to process speech sounds under more ecologically valid conditions, notably when speech sounds lead to contact with the mental lexicon.

Regarding a possible mediating role of the motor system in speech recognition (not only speech perception), note that this is the case in many studies showing activations of cortical motor areas involved in planning and executing speech production (ie, using passive listening to words or sentences, see Fadiga et al., 2002; Watkins and Paus, 2003; Watkins, Strafella and Paus, 2004; Skipper, Nusbaum and Small, 2005; Roy et al., 2008...).

Also I think that, apart from auditory lexical/semantic comprehension, another 'ultimate goal' of speech perception/recognition studies is to understand the cognitive and neural processes that support the ability to communicate (a point which is not so emphasized in this blog ;-) ). As previously said, I think with other colleagues that speech motor resonance might represent a dynamic sensorimotor adaptation under the influence of the other talker's speech patterns, and in return may facilitate conversational interactions through convergent behaviors.

To further investigate speech processing in more ecologically conditions would require investigations of language processing in full multi-modal and environmental context (see the paper by Steve Small in Brain and Language, 2004), and not only using 'artificial' phonological tasks such as syllable discrimination (as Jonas said) or 'more ecologically' lexical tasks such as picture naming (from your point of view).


Greg Hickok said...

Hi Marc,
I'm well aware of the many studies that show that the motor system is activated during passive speech listening or reading. This tells you that there is an association between speech perception and the motor system, but it doesn't mean that the motor system is doing anything in the perceptual process; you need other experiments, like the D'Ausilio study to show that. Once you do those experiments, you find that the influence is modulatory.

I take comprehension to be a part of the communication process, so I think we do address at least that aspect of communication. But your point about real life communication is interesting. I haven't thought or read about it much. To be honest I don't know what it means to say that "speech motor resonance might represent a dynamic sensorimotor adaptation under the influence of the other talker's speech patterns." But that is probably just my ignorance of the work on this. I have to say, it sure sounds fancy, so there must be something to it! ;-)

Regarding tasks and ecological validity generally, how about this as a guide to our research programs:

We all agree that task matters, and that changing the task can change the circuits involved.

So, let's all stop pretending we are studying "speech perception" or "language processing" or whatever, and just call it what it is. I'm studying, for example, the neural circuits involved in auditory comprehension as measured by word-picture matching, or auditory-motor integration as studied by asking people to listen to nonwords and silently repeat them (not exactly a "natural" task!). D'Ausilio et al. studied the neural circuits involved in identifying syllables in noise. Whether these neural circuits are related to one another is an empirical question that requires experimental investigation.

If we approach our work in this way -- i.e., we are studying a particular task -- then a lot of the confusion goes away.

Anonymous said...

Hi Greg,

I agree that we should specifically call what we are studying (everybody would agree I think). As a matter of fact, the title of my paper you quoted today is 'a mediating role of the premotor cortex in phoneme segmentation'. Since phoneme segmentation has nothing to do with auditory comprehension, from your point of view at least, that's fine?


Greg Hickok said...

Yes, I like that title because it is transparent. The problem of course is that it is hard to get studies with a title like that published in high profile broad journals which have to appeal to a broader audience.

Phoneme segregation could have a lot to do with auditory comprehension; it's an empirical question.

CEJ said...

It is not necessarily a lack of experimental data, it's a lack of skeptical epistemology and better conceptualizations in order to interpret the data--these are holding back the field.

CEJ said...

>>I don't know what other folks are studying when they study speech perception, but to me speech perception is best conceptualized as that process that allows a listener to access a lexical concept (~word meaning) from a speech signal.<<

I think that is lexical access. But you are right, we lack specificity when we say things like 'speech perception'.