tag:blogger.com,1999:blog-9048879464910781933.post3828467052579787967..comments2023-10-12T00:25:24.119-07:00Comments on Talking Brains: "Categorical perception" in neuroscience studies of speechGreg Hickokhttp://www.blogger.com/profile/16656473495682901613noreply@blogger.comBlogger25125tag:blogger.com,1999:blog-9048879464910781933.post-32236384758879641082010-05-04T04:31:41.166-07:002010-05-04T04:31:41.166-07:00Dear Greg, dear David,
I like the new interest in ...Dear Greg, dear David,<br />I like the new interest in the topic of categorical perception in the field of neuroscience.<br />But I am comming from the field of traditional experimental phonetics and for me, the decision, whether categorical perception occurs or not needs identification AND discriminiation expreiments AND the calculation of a "discrimination rate based on individual identification scores".<br />The stimuli are perceived more categorical, the stronger the difference between measured and calculated discrimination is. <br />See my paper: <br />Kroeger et al. 2009: <br />Towards a neurocomputational model of speech production and perception. Speech Communication 51: 793-809Anonymoushttps://www.blogger.com/profile/10294081413814324801noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-50836850373930969272009-11-10T18:59:25.391-08:002009-11-10T18:59:25.391-08:00Hi Arild,
Yes that is correct. I think you will g...Hi Arild,<br />Yes that is correct. I think you will get a step-like function if you don't sample the boundary region very densely. If you have lots of steps in that area it will likely be continuous. That is my guess anyway.Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-45799981825864800082009-11-10T18:51:18.223-08:002009-11-10T18:51:18.223-08:00is cumulative d' just adding up d' from ea...is cumulative d' just adding up d' from each pairwise comparison? i.e. d'(stim1 vs stim2), then d'(stim1 vs. stim2) + d'(stim2 vs stim3), etc? I just did that with a single subject data set and I still get a step-like function that looks similar to the proportion against VOT function. Sort of like Raj in his paper.Anonymoushttps://www.blogger.com/profile/11004947975643850040noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-68676635535424815372009-10-23T21:02:51.989-07:002009-10-23T21:02:51.989-07:00That is cumulative d' so you need to look at t...That is cumulative d' so you need to look at the difference from 1 vs. 2 compared the difference from 9 vs. 10. the difference is the same (= ~linear increase) across all the range. This suggests a non-categorical function.Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-47166028701985736872009-10-23T20:40:44.872-07:002009-10-23T20:40:44.872-07:00Hi Greg,
Was just re-reading this old post. I not...Hi Greg,<br /><br />Was just re-reading this old post. I noticed that I don't quite understand the following:<br /><br />1) How exactly were the d' values for the discriminability of stim 1 vs stim 2 (stim2 vs. stim 3...) calculated from the Ga vs. Ba identification experiment?<br /><br />2) The way the d' values are plotted, are we to understand that stim 1 and stim 2 are "less discriminable" than stim 9 and stim 10? If so, that's quite an odd result. It doesn't follow from either categorisation or simple acoustic differences between any two adjacent stimuli.<br /><br />Clearly, I missing some crucial info here :).karthik durvasulahttps://www.blogger.com/profile/14541529987768107005noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-39262410271824549622009-09-04T12:34:53.574-07:002009-09-04T12:34:53.574-07:00What I think is that if presented a continuum from...What I think is that if presented a continuum from one word to another (say, bin-pin) subjects will hear unambiguous word at the ends of the continuum and varying continuously more ambiguous as you get toward the middle. So if you allowed them to rate the stimulus on a 5 point scale (1=bin, 2=ambiguous, 3=pin) you would get a nice continuous function. <br /><br />Now regarding the phoneme categories specifically, I actually don't think we extract phonemic categories in perception most of the time. Rather I think a more fundamental unit is the syllable.Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-50060466555381739602009-09-04T11:42:21.765-07:002009-09-04T11:42:21.765-07:00Perhaps your claim isn't as extreme as I thoug...Perhaps your claim isn't as extreme as I thought. I thought you were trying to say something about how we extract meaning from speech. The idea that we can perceive that difference isn't really related to the transfer from speech to semantics.<br /><br />Would you agree that any transmission of meaning involves grouping of sounds into phonemes?<br /><br />More concretely, that if you were to adjust the spectral differences between ga and ba gradually, that the perceived meaning of the speech would be much more like categorization task than the discrimination task?Kevin Hillnoreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-15059161685361678522009-09-03T16:09:20.862-07:002009-09-03T16:09:20.862-07:00Hi Kevin,
You are correct. There are categories i...Hi Kevin,<br />You are correct. There are categories in language. There are even phonemic categories -- just like there are categories of plants or animals. But just because we categorize an oak and a maple as trees, doesn't mean the visual system can't tell the difference. My suggestion is that phonemic categories, and the so-called phenomenon of categorical perception of speech, are not properties of the auditory system. We can perceive the difference, even within categories. You just need the right task and right measure to see it.Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-40086474820649323822009-09-03T14:23:39.241-07:002009-09-03T14:23:39.241-07:00After giving some more thought, I think I need a b...After giving some more thought, I think I need a bit of clarification on your argument.<br /><br />It seems clear that language (as opposed to speech) is categorical. If you think of a 'book' or a 'cook' you are thinking of two clearly defined objects with well defined boundaries. No one expects a book to run a restaurant.<br /><br />So clearly, at some point between the acoustic signal coming in our ears and full language comprehension some form of categorization must be occuring, but the question is where/at what level.<br /><br />There seem to be three levels at which this categorization could take place, the phoneme, the single word, or the entire sentence.<br /><br />So, it seems to me that your argument in the most conservative sense is that when we are understanding full sentences, there is no categorization of individual phonemes. Is this right?Kevin Hillnoreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-64129405764931655482009-08-27T12:26:52.011-07:002009-08-27T12:26:52.011-07:00The data was indeed spatially normalised, which is...The data was indeed spatially normalised, which is always an uncertain business. I think we're in agreement: I have pretty low confidence about which side of the Sylvian the activation is on. That's why I didn't put that figure in the paper! :-)<br /><br />The main focus of the paper is on what type of information processing is taking place, rather than where it is located. What's interesting to me about the SMG ROI is the way it seems to be specifically amplifying stimulus-differences that cross the category boundary. It's not really crucial which side of the fissure it's on, although it is fun to wonder about. :-)Rajeev Raizadahttp://www.dartmouth.edu/~raj/noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-87313931177795461892009-08-27T12:02:44.836-07:002009-08-27T12:02:44.836-07:00Thanks for the figure Raj. I'm still not sure...Thanks for the figure Raj. I'm still not sure though. I assume you did this in some standardized space? The problem we have found is that simply by normalizing a single subject's data to Talairach or MNI an activation focus can jump from within the Sylvian fissure to above it. Normalization is the problem, not group averaging. I'll post an example of this soon.Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-72636438435899302002009-08-27T11:56:54.406-07:002009-08-27T11:56:54.406-07:00Hi Patti,
Nit picky is good! You are correct that...Hi Patti,<br />Nit picky is good! You are correct that CP was used to argue that speech is special, but it was also part of the argument for the idea that we perceive speech in terms of articulation. Here's a quote from Liberman et al. 1967 Psych Rev 74:431-461<br /><br />"As described earlier in this paper, perception of these sounds is categorical, or discontinuous, even when the acoustic signal is varied continuously. Quite obviously, the required articulations would also be discontinuous. With /b,d,g/, we can vary the acoustic cue along a continuum, which corresponds, in effect, to closing the vocal tract at various points along its length. But in actual speech the closing is accomplished by discontinuous or categorically different gestures: by the lips for /b/, the tip of the tongue for /d/, and the back of the tongue for/g/. Here, too, perception appears to be tied to articulation." (p. 453).Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-20998832100562894462009-08-27T11:56:46.815-07:002009-08-27T11:56:46.815-07:00Re Greg's question about whether the supramarg...Re Greg's question about whether the supramarginal gyrus activation might actually be in the planum temporale: I did look into that a bit, by hand-tracing the subjects' planum temporale ROIs, and then comparing the supramaginal ROI to them. From that comparison, it really does look like the activation was above the Sylvian fissure, although I agree that fMRI will always leave a fair bit of uncertainty about such things. That figure didn't go into the paper, but a slide with it is <a href="http://www.nmr.mgh.harvard.edu/~raj/PDFs/is_it_really_above_sylvian.pdf" rel="nofollow">here</a>.Rajeev Raizadahttp://www.dartmouth.edu/~raj/noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-15342888874981295572009-08-27T10:40:24.296-07:002009-08-27T10:40:24.296-07:00Nice post but:
"Listeners appeared to perce...Nice post but: <br /><br />"Listeners appeared to perceive speech sounds differently from non-speech sounds, i.e., categorically, and this was taken as evidence for the motoric nature of the speech perception process."<br /><br />Not to be nit picky but CP was actually interpreted as evidence that speech is 'special', one of the other claims of the original MT... It relates more to the modular nature of speech processing (cf. face processing) than to the specifics of how this processing occurs. Maybe the discussion about the modular nature of speech processing will return at some point as well ;-)Unknownhttps://www.blogger.com/profile/10689690502862416685noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-88990368571779931952009-08-27T09:09:04.126-07:002009-08-27T09:09:04.126-07:00Here's why it makes a difference: the next tim...Here's why it makes a difference: the next time you are talking to someone, stop at some point and ask them whether you uttered the syllable ba in the last sentence. They will have no idea. They will know what you said, but will not know what the sounds were. Now ask them to listen for ba in your speech. They will be able to do it, but it takes a conscious effort, i.e., the recruitment of mechanisms that are not normally used in speech comprehension. We perceive words not sounds. <br /><br />So the question is, when we test for CP using standard methods, are we measuring Ss acoustic perception of speech sounds? Or are we measuring the additional mechanisms involved in consciously attending to speech sound information. I'm suggesting it may be the latter. <br /><br />I like your experiment. Why don't you run it?! You will have to make sure that you use lots of sentences and response options so subjects don't quickly learn that it is just a speech perception task and they only need to attend to one speech sound. <br /><br />Lori Holt and Andrew Lotto have done some work on speech perception in context. Not exactly comprehension (that I know of) but worth looking at.Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-34007390907452335952009-08-27T08:06:59.745-07:002009-08-27T08:06:59.745-07:00I guess I just don't see how the categorical d...I guess I just don't see how the categorical decision between ba and pa is not critical to speech comprehension. <br /><br />What if there were two sentences, that had vastly different meanings, but were separated by a single syllable. "I like the train" vs "I like the rain" is the best I can come up with off the top of my head, but I'm sure there's a better example. If you were to parametrically vary the difference between the two syllables, I bet the comprehension would be a lot closer to CP model than the discrimination model.<br /><br />Are there any studies that try to put the CP framework into such ecologically valid contexts?Kevin Hnoreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-88957118675101346822009-08-26T10:23:16.180-07:002009-08-26T10:23:16.180-07:00Hi Kevin,
You hit the heart of the issue. What IF...Hi Kevin,<br />You hit the heart of the issue. What IF these two different tasks are driven by two separate neural system? And how do we know WHICH system is the "speech perception" system we all want to understand? What David P. and I have arguing for all these years is that the "real" speech perception system is the one that we use in ecologically valid contexts -- when we are processing speech for comprehension. If we find that a different system is operating when we comprehend speech versus when we ask subjects to decide whether they heard a ba or a pa, then we should disregard the later as a task-induced process that is not relevant to the process we really want to understand. Of course, if you are interested in the neural basis of "pa vs. ba identification/discrimination" then by all means study that system. It's probably relevant to reading...Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-16513280019593349252009-08-26T10:17:45.637-07:002009-08-26T10:17:45.637-07:00Hey Raj,
I haven't read the paper yet (just sk...Hey Raj,<br />I haven't read the paper yet (just skimmed so far) but I did notice the non-effect in auditory regions -- an interesting result. The SMG activity is probably Spt in the posterior planum temporale region. This activation often mis-localizes to SMG. Again, thanks for the heads up on your very nice paper...Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-42140352380696942092009-08-26T10:06:08.146-07:002009-08-26T10:06:08.146-07:00I think your taking the data a bit too far. You sa...I think your taking the data a bit too far. You say it is a 'task effect', but what if those two different tasks are driven by two separate neural systems? Or possibly two separate behaviors of the same underlying tissue. Then it's more of a question of which system is really doing speech perception, which seems more analogous to to the CP task than the discrimination task.Kevin Hnoreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-62242031265800449002009-08-26T10:02:54.505-07:002009-08-26T10:02:54.505-07:00:-)
No problem about the first/last name mix-up. M...:-)<br />No problem about the first/last name mix-up. My name is kind of confusing. :-)<br />People usually call me Raj.<br /><br />I agree with you about categorical perception of speech probably being a higher-level process than the ones primary auditory cortex mostly deals with. In the paper, we plotted neurometric curves from subjects' anatomically traced Heschl's gyrus and planum temporale ROIs, and those areas don't seem to show much sensitivity to the category boundary. They do show a little bit, but much less than supramarginal gyrus.Rajeev Raizadahttp://www.dartmouth.edu/~rajnoreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-74122462731093502432009-08-26T08:56:17.307-07:002009-08-26T08:56:17.307-07:00I meant Hi Rajeev! sorry. :-)I meant Hi Rajeev! sorry. :-)Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-21799073752638922682009-08-26T08:54:42.783-07:002009-08-26T08:54:42.783-07:00Hi Raizada,
Thank you for pointing me to your pape...Hi Raizada,<br />Thank you for pointing me to your paper. I'm just starting to look into this CP stuff and haven't yet looked at all the recent work in any detail. I'm thrilled to see that you reported d'. I'll have a closer look at your data soon! <br /><br />I find it very interesting though that you don't see any CP effects in auditory cortex! Why is that? I speculate that it is because CP is not a sensory phenomenon but something operating at a higher level.Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-43678969129516823232009-08-26T08:31:42.589-07:002009-08-26T08:31:42.589-07:00As part of an fMRI study on categorical perception...As part of an fMRI study on categorical perception of the /ba/-/da/ continuum, I plotted same-different curves using d-prime (Fig.2) and also regular percentage-correct (Fig.1). They look almost exactly the same. There's a genuine increase in sensitivity to differences that straddle the category boundary, it's not just a criterion shift. I believe that others have found similar d-prime results. The paper shows that this sensitivity to category boundary is reflected by the fMRI signal, especially in left supramarginal gyrus. The PDF of the paper is <a href="http://www.nmr.mgh.harvard.edu/~raj/papers/Raizada_Poldrack_Neuron2007.pdf" rel="nofollow"> here</a>.Rajeev Raizadahttp://www.dartmouth.edu/~raj/noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-71758258958958059672009-08-26T08:26:46.654-07:002009-08-26T08:26:46.654-07:00Yes, that is exactly right Ewan. I'm making a ...Yes, that is exactly right Ewan. I'm making a different point that Schouten et al. and one that is not typically brought up in the CP world. As you point out CP is not determined by the shape of the ID curve as the ID curve is assumed to be a direct measure of the listener's categories. Discrimination measures acoustic perception of stimuli within and across those categories. If ID and Disc match, then CP holds. Schouten et al. showed that they don't match with certain tasks. <br /><br />My point backs up one step. If the ID curve, which is explicitly categorical, is continuous then CP is completely undermined. Am I making sense?Greg Hickokhttps://www.blogger.com/profile/16656473495682901613noreply@blogger.comtag:blogger.com,1999:blog-9048879464910781933.post-81851875320221429742009-08-26T04:09:41.976-07:002009-08-26T04:09:41.976-07:00... perhaps I'm misunderstanding you here, but...... perhaps I'm misunderstanding you here, but we don't "determine whether subjects perceive speech sounds categorically" from the shape of the identification curve, but rather from the degree to which the discrimination data match it (ie peak in discr. at cat. boundary). I guess you are right that you can infer something like discriminability from the id curve, but is there really an argument to be made from this? Schouten et al don't make that argument at any rate. Their issue is not with the id task, it's with the fact that different discrimination tasks give more or less categorical discr. profiles. But this has been known since at least 1973 (e.g. Pisoni).ewanhttps://www.blogger.com/profile/00161859381870853353noreply@blogger.com