Tuesday, August 25, 2009

"Categorical perception" in neuroscience studies of speech

Old speech phenomena don't die they just become morphed into neuroscience studies.
-Andrew Lotto

The phenomenon of categorical perception appears to be riding the coattails of the resurgence of interest in motor theories of speech perception. Back in the motor theory heyday, categorical perception was all the rage. Listeners appeared to perceive speech sounds differently from non-speech sounds, i.e., categorically, and this was taken as evidence for the motoric nature of the speech perception process. The argument was something like this... Acoustic signals vary continuously. Articulatory patterns are categorical (/b/ is always produced bilabially). Perception mirrors the categorical nature of articulation. Therefore we perceive speech via our motor system.

Problems with this view quickly arose. Non-human, and therefore non-speaking, animals such as chinchillas and quail, were found to exhibit categorical perception for speech sounds. Babies too, who hadn't yet acquired the ability to articulate speech, also exhibited categorical perception. Categorical perception of non-speech sounds was also demonstrated. Further, perception of speech sounds was found to be continuous if listeners were asked to rate how well a stimulus represented a given category rather than asking them to make a binary decision.

Interest in categorical perception (CP) faded -- except in neuroscience where the pace of CP studies seems to be accelerating. Here's just a few from this year:

Möttönen R, Watkins KE. Motor representations of articulators contribute to
categorical perception of speech sounds. J Neurosci. 2009 Aug 5;29(31):9819-25.

Salminen NH, Tiitinen H, May PJ. Modeling the categorical perception of speech
sounds: A step toward biological plausibility. Cogn Affect Behav Neurosci. 2009

Clifford A, Franklin A, Davies IR, Holmes A. Electrophysiological markers of categorical perception of color in 7-month old infants. Brain Cogn. 2009

Prather JF, Nowicki S, Anderson RC, Peters S, Mooney R. Neural correlates of categorical perception in learned vocal communication. Nat Neurosci. 2009 Feb;12(2):221-8.

I hinted previously that the failure to use signal detection analysis methods in the context of categorical perception studies may have contaminated the whole field of CP research. Lori Holt recently pointed me to a paper by Schouten et al. 2003, provocatively titled "The End of Categorical Perception as We Know It". The point of the paper is exactly was I was hinting at: perception only looks categorical because of inherent bias in the tasks used to measure it.

The traditional categorical-perception experiment measures the bias inherent in the discrimination task
(Schouten et al. 2003, p. 71)

Here's another interesting quote from this paper:

Despite an auspicious beginning with a clear experimental definition ... categorical perception has in practice remained an ill-defined or even undefined concept, which could be used to underpin a variety of sometimes mutually exclusive claims, for example for or against the motor theory (p. 72)

This is an interesting paper that is worth a close look. But back to bias...

Let me illustrate very simply using some categorical perception data that I pulled from the literature. The graph below shows real data from a CP experiment using a GA-DA continuum. The task is explicitly categorical: subjects are asked to decide whether a stimulus is an example of GA or DA. This is not a good task to determine whether subjects perceive speech sounds categorically because it forces them to categorize. As Schouten et al. put it, "... if the nature of the task compels subjects to use a labelling strategy, categorical perception will be pretty much a foregone conclusion" (p. 77). Nonetheless, use of d-prime measures shows a rather different picture to standard measures. The vertical access is proportion of GA responses, and the horizontal axis is the various stimuli along the continuum. Perception looks nicely categorical.

Now plot the same data in d-prime units. To do this you can calculate d' for each pair of adjacent stimuli (how well are Ss discriminating Stim1 from Stim2, Stim2 from Stim3, etc.). Plotted here is cumulative d'. We should see discontinuities in the cumulative d'. Instead we see a more continuous function.

Have a look at the papers by Lori Holt and Andrew Lotto that I highlighted in a previous post as well as the Schouten et al. paper for more critical views on the nature of categorical perception. Then there's always long-time CP skeptic Dominic Massaro. His work on the topic is also worth a look.

What are the implications for neuroscience studies of speech perception? Well, if CP is nothing more than task effects and/or subject bias, then by using CP paradigms to map speech perception systems, all that is being mapped is task strategies and/or subject bias. No wonder all these studies find effects in the frontal lobe!

Schouten, B. (2003). The end of categorical perception as we know it Speech Communication, 41 (1), 71-80 DOI: 10.1016/S0167-6393(02)00094-8


ewan said...

... perhaps I'm misunderstanding you here, but we don't "determine whether subjects perceive speech sounds categorically" from the shape of the identification curve, but rather from the degree to which the discrimination data match it (ie peak in discr. at cat. boundary). I guess you are right that you can infer something like discriminability from the id curve, but is there really an argument to be made from this? Schouten et al don't make that argument at any rate. Their issue is not with the id task, it's with the fact that different discrimination tasks give more or less categorical discr. profiles. But this has been known since at least 1973 (e.g. Pisoni).

Greg Hickok said...

Yes, that is exactly right Ewan. I'm making a different point that Schouten et al. and one that is not typically brought up in the CP world. As you point out CP is not determined by the shape of the ID curve as the ID curve is assumed to be a direct measure of the listener's categories. Discrimination measures acoustic perception of stimuli within and across those categories. If ID and Disc match, then CP holds. Schouten et al. showed that they don't match with certain tasks.

My point backs up one step. If the ID curve, which is explicitly categorical, is continuous then CP is completely undermined. Am I making sense?

Rajeev Raizada said...

As part of an fMRI study on categorical perception of the /ba/-/da/ continuum, I plotted same-different curves using d-prime (Fig.2) and also regular percentage-correct (Fig.1). They look almost exactly the same. There's a genuine increase in sensitivity to differences that straddle the category boundary, it's not just a criterion shift. I believe that others have found similar d-prime results. The paper shows that this sensitivity to category boundary is reflected by the fMRI signal, especially in left supramarginal gyrus. The PDF of the paper is here.

Greg Hickok said...

Hi Raizada,
Thank you for pointing me to your paper. I'm just starting to look into this CP stuff and haven't yet looked at all the recent work in any detail. I'm thrilled to see that you reported d'. I'll have a closer look at your data soon!

I find it very interesting though that you don't see any CP effects in auditory cortex! Why is that? I speculate that it is because CP is not a sensory phenomenon but something operating at a higher level.

Greg Hickok said...

I meant Hi Rajeev! sorry. :-)

Rajeev Raizada said...

No problem about the first/last name mix-up. My name is kind of confusing. :-)
People usually call me Raj.

I agree with you about categorical perception of speech probably being a higher-level process than the ones primary auditory cortex mostly deals with. In the paper, we plotted neurometric curves from subjects' anatomically traced Heschl's gyrus and planum temporale ROIs, and those areas don't seem to show much sensitivity to the category boundary. They do show a little bit, but much less than supramarginal gyrus.

Kevin H said...

I think your taking the data a bit too far. You say it is a 'task effect', but what if those two different tasks are driven by two separate neural systems? Or possibly two separate behaviors of the same underlying tissue. Then it's more of a question of which system is really doing speech perception, which seems more analogous to to the CP task than the discrimination task.

Greg Hickok said...

Hey Raj,
I haven't read the paper yet (just skimmed so far) but I did notice the non-effect in auditory regions -- an interesting result. The SMG activity is probably Spt in the posterior planum temporale region. This activation often mis-localizes to SMG. Again, thanks for the heads up on your very nice paper...

Greg Hickok said...

Hi Kevin,
You hit the heart of the issue. What IF these two different tasks are driven by two separate neural system? And how do we know WHICH system is the "speech perception" system we all want to understand? What David P. and I have arguing for all these years is that the "real" speech perception system is the one that we use in ecologically valid contexts -- when we are processing speech for comprehension. If we find that a different system is operating when we comprehend speech versus when we ask subjects to decide whether they heard a ba or a pa, then we should disregard the later as a task-induced process that is not relevant to the process we really want to understand. Of course, if you are interested in the neural basis of "pa vs. ba identification/discrimination" then by all means study that system. It's probably relevant to reading...

Kevin H said...

I guess I just don't see how the categorical decision between ba and pa is not critical to speech comprehension.

What if there were two sentences, that had vastly different meanings, but were separated by a single syllable. "I like the train" vs "I like the rain" is the best I can come up with off the top of my head, but I'm sure there's a better example. If you were to parametrically vary the difference between the two syllables, I bet the comprehension would be a lot closer to CP model than the discrimination model.

Are there any studies that try to put the CP framework into such ecologically valid contexts?

Greg Hickok said...

Here's why it makes a difference: the next time you are talking to someone, stop at some point and ask them whether you uttered the syllable ba in the last sentence. They will have no idea. They will know what you said, but will not know what the sounds were. Now ask them to listen for ba in your speech. They will be able to do it, but it takes a conscious effort, i.e., the recruitment of mechanisms that are not normally used in speech comprehension. We perceive words not sounds.

So the question is, when we test for CP using standard methods, are we measuring Ss acoustic perception of speech sounds? Or are we measuring the additional mechanisms involved in consciously attending to speech sound information. I'm suggesting it may be the latter.

I like your experiment. Why don't you run it?! You will have to make sure that you use lots of sentences and response options so subjects don't quickly learn that it is just a speech perception task and they only need to attend to one speech sound.

Lori Holt and Andrew Lotto have done some work on speech perception in context. Not exactly comprehension (that I know of) but worth looking at.

Patti said...

Nice post but:

"Listeners appeared to perceive speech sounds differently from non-speech sounds, i.e., categorically, and this was taken as evidence for the motoric nature of the speech perception process."

Not to be nit picky but CP was actually interpreted as evidence that speech is 'special', one of the other claims of the original MT... It relates more to the modular nature of speech processing (cf. face processing) than to the specifics of how this processing occurs. Maybe the discussion about the modular nature of speech processing will return at some point as well ;-)

Rajeev Raizada said...

Re Greg's question about whether the supramarginal gyrus activation might actually be in the planum temporale: I did look into that a bit, by hand-tracing the subjects' planum temporale ROIs, and then comparing the supramaginal ROI to them. From that comparison, it really does look like the activation was above the Sylvian fissure, although I agree that fMRI will always leave a fair bit of uncertainty about such things. That figure didn't go into the paper, but a slide with it is here.

Greg Hickok said...

Hi Patti,
Nit picky is good! You are correct that CP was used to argue that speech is special, but it was also part of the argument for the idea that we perceive speech in terms of articulation. Here's a quote from Liberman et al. 1967 Psych Rev 74:431-461

"As described earlier in this paper, perception of these sounds is categorical, or discontinuous, even when the acoustic signal is varied continuously. Quite obviously, the required articulations would also be discontinuous. With /b,d,g/, we can vary the acoustic cue along a continuum, which corresponds, in effect, to closing the vocal tract at various points along its length. But in actual speech the closing is accomplished by discontinuous or categorically different gestures: by the lips for /b/, the tip of the tongue for /d/, and the back of the tongue for/g/. Here, too, perception appears to be tied to articulation." (p. 453).

Greg Hickok said...

Thanks for the figure Raj. I'm still not sure though. I assume you did this in some standardized space? The problem we have found is that simply by normalizing a single subject's data to Talairach or MNI an activation focus can jump from within the Sylvian fissure to above it. Normalization is the problem, not group averaging. I'll post an example of this soon.

Rajeev Raizada said...

The data was indeed spatially normalised, which is always an uncertain business. I think we're in agreement: I have pretty low confidence about which side of the Sylvian the activation is on. That's why I didn't put that figure in the paper! :-)

The main focus of the paper is on what type of information processing is taking place, rather than where it is located. What's interesting to me about the SMG ROI is the way it seems to be specifically amplifying stimulus-differences that cross the category boundary. It's not really crucial which side of the fissure it's on, although it is fun to wonder about. :-)

Kevin Hill said...

After giving some more thought, I think I need a bit of clarification on your argument.

It seems clear that language (as opposed to speech) is categorical. If you think of a 'book' or a 'cook' you are thinking of two clearly defined objects with well defined boundaries. No one expects a book to run a restaurant.

So clearly, at some point between the acoustic signal coming in our ears and full language comprehension some form of categorization must be occuring, but the question is where/at what level.

There seem to be three levels at which this categorization could take place, the phoneme, the single word, or the entire sentence.

So, it seems to me that your argument in the most conservative sense is that when we are understanding full sentences, there is no categorization of individual phonemes. Is this right?

Greg Hickok said...

Hi Kevin,
You are correct. There are categories in language. There are even phonemic categories -- just like there are categories of plants or animals. But just because we categorize an oak and a maple as trees, doesn't mean the visual system can't tell the difference. My suggestion is that phonemic categories, and the so-called phenomenon of categorical perception of speech, are not properties of the auditory system. We can perceive the difference, even within categories. You just need the right task and right measure to see it.

Kevin Hill said...

Perhaps your claim isn't as extreme as I thought. I thought you were trying to say something about how we extract meaning from speech. The idea that we can perceive that difference isn't really related to the transfer from speech to semantics.

Would you agree that any transmission of meaning involves grouping of sounds into phonemes?

More concretely, that if you were to adjust the spectral differences between ga and ba gradually, that the perceived meaning of the speech would be much more like categorization task than the discrimination task?

Greg Hickok said...

What I think is that if presented a continuum from one word to another (say, bin-pin) subjects will hear unambiguous word at the ends of the continuum and varying continuously more ambiguous as you get toward the middle. So if you allowed them to rate the stimulus on a 5 point scale (1=bin, 2=ambiguous, 3=pin) you would get a nice continuous function.

Now regarding the phoneme categories specifically, I actually don't think we extract phonemic categories in perception most of the time. Rather I think a more fundamental unit is the syllable.

karthik durvasula said...

Hi Greg,

Was just re-reading this old post. I noticed that I don't quite understand the following:

1) How exactly were the d' values for the discriminability of stim 1 vs stim 2 (stim2 vs. stim 3...) calculated from the Ga vs. Ba identification experiment?

2) The way the d' values are plotted, are we to understand that stim 1 and stim 2 are "less discriminable" than stim 9 and stim 10? If so, that's quite an odd result. It doesn't follow from either categorisation or simple acoustic differences between any two adjacent stimuli.

Clearly, I missing some crucial info here :).

Greg Hickok said...

That is cumulative d' so you need to look at the difference from 1 vs. 2 compared the difference from 9 vs. 10. the difference is the same (= ~linear increase) across all the range. This suggests a non-categorical function.

ArildHestvik said...

is cumulative d' just adding up d' from each pairwise comparison? i.e. d'(stim1 vs stim2), then d'(stim1 vs. stim2) + d'(stim2 vs stim3), etc? I just did that with a single subject data set and I still get a step-like function that looks similar to the proportion against VOT function. Sort of like Raj in his paper.

Greg Hickok said...

Hi Arild,
Yes that is correct. I think you will get a step-like function if you don't sample the boundary region very densely. If you have lots of steps in that area it will likely be continuous. That is my guess anyway.

bkroeger said...

Dear Greg, dear David,
I like the new interest in the topic of categorical perception in the field of neuroscience.
But I am comming from the field of traditional experimental phonetics and for me, the decision, whether categorical perception occurs or not needs identification AND discriminiation expreiments AND the calculation of a "discrimination rate based on individual identification scores".
The stimuli are perceived more categorical, the stronger the difference between measured and calculated discrimination is.
See my paper:
Kroeger et al. 2009:
Towards a neurocomputational model of speech production and perception. Speech Communication 51: 793-809