Friday, June 25, 2010

How does learning to read affect speech perception?

Sigh... It depends on what you mean by "speech perception" (still).

I just read, with much anticipation, a paper in the current issue of J. Neuroscience by Pattamadilok et al. titled "How Does Learning to Read Affect Speech Perception?" I was really excited because as I've pointed out before, there is evidence indicating that the ability to perform certain "speech perception" tasks (e.g., syllable discrimination/identification) seems to be dependent on the ability to read. Assuming that illiterates can nonetheless understand spoken language (after all we've been doing it for hundreds of thousands of years), such a finding seems to indicate that these "speech perception" tasks are not really measuring speech perception as it is used in normal language processing.

I was hoping this new paper was going to drive home this point, but instead, although the authors seem sensitive to these task issues, the report does more to perpetuate the confusion about speech perception and "phonological processing" than it does to clean things up. Consistent with the field in general, the Pattamadilok et al. report uses a range of terms (not always defined) that a reader may take to mean the same thing (speech perception, speech processing, phonological representations, phonological processing, phono-articulatory patterns) and they employ or refer to an array of tasks (lexical decision, rhyme judgement, phonological awareness) none of which were probably ever performed when the human speech perception system was evolving.

Why is this a problem? (Don't worry I'll get to the actual study in a minute.) Because the title uses the term "speech perception" which implies to most that the reported research is fundamentally about our ability to perceive the speech sounds that allow us to understand spoken language (its evolutionarily relevant function), and how learning to read affects this basic perceptual function. But the paper doesn't assess speech perception in this more fundamental sense and instead assesses the ability to decide whether an acoustic sequence is a word or not, and the ability to decide whether two words rhyme. Further, because "speech perception" was effectively operationalized in this way, they end up assessing a brain region that has not been implicated in the more basic speech perception functions. So the title is misleading. It should be, How Does Learning to Read Affect the Ability to Decide Whether a Sequence of Sounds is a Word or Not?

So what did they do? In short, they used TMS to localize brain regions underlying the orthographic consistency effect: listeners are faster to judge an auditorily presented word as a word (auditory lexical decision) when the word's rime has only one possible spelling (must) compared to when the word's rime has many spellings (break), i.e., a word's spelling affects the "processing" (operationalized as lexical decision) of spoken words. They found that stimulation of the supramarginal gyrus, "an area involved in phonological processing" (p. 8435 -- notice the term speech perception was not used), abolished the orthographic consistency effect, whereas stimulation to an orthographic area in the ventral occipital cortex did not abolish the effect.

They conclude, "...these findings provide strong evidence that 'orthographic' influences in speech perception arise at a phonological, rather than orthographic, level." (p. 8441).

The main problem I have with the study, besides the terminological issues, is that the SMG target areas were defined functionally in a pre-test TMS study as regions that when stimulated caused deficits in making rhyme judgments to visually presented word pairs. Rhyme judgments (and similar phonological awareness tasks) are exactly the kind of abilities that have been related to reading development. So essentially what they've done is selected areas that are involved in reading skills and shown that they are involved in another reading related effect. This strikes me as circular.

Pattamadilok et al., note that the SMG isn't part of the standardly identified speech perception network (which is the STG) and end up explaining its role in "phonological processing" via its link to the articulatory system and phonological STM, which is probably correct: "we hypothesize that the PMv-SMG circuit plays an integral role in representing and processing representations for phono-articulatory patterns that contribute to 'phonological processing.'" (p. 8441) [their quotes, again notice they didn't use the term 'speech perception']. If by "phonological processing" they mean the ability to make lexical decisions, I wouldn't disagree, I just wish they would have put that in the title.

Pattamadilok C, Knierim IN, Kawabata Duncan KJ, & Devlin JT (2010). How does learning to read affect speech perception? The Journal of neuroscience : the official journal of the Society for Neuroscience, 30 (25), 8435-44 PMID: 20573891

Thursday, June 24, 2010

Preliminary Program for the second annual Neurobiology of Language Conference (NLC 2010)

Dear colleague,

We are delighted to announce that the Preliminary Program for the second annual Neurobiology of Language Conference (NLC 2010) is now available online! Visit our website to download a copy.

This year the Conference will feature poster and slide presentations as well as keynote presentations by Daniel Margoliash (The University of Chicago, US) and Karl Deisseroth (Stanford University, US). The conference will also include two panel discussions on controversial topics in the field of language neurobiology. The first panel will focus on the issue of the organization of semantic memory and feature talks by Alex Martin (National Institute of Mental Health, US) and Karalyn Patterson (University of Cambridge, UK). The other panel discussion will focus on the role of the visual word form area and feature talks by Cathy Price (Wellcome Trust Department of NeuroImaging, University College London, UK) and Stanislas Dehaene (Collège de France, INSERM-CEA Cognitive Neuroimaging Unit, France).

A reminder that early Registration will be closing on July 30! For more information, visit website at www.neurolang.org


Sincerely,

Pascale Tremblay, Ph.D., Postdoctoral Scholar, The University of Chicago
Steven L. Small, Ph.D., M.D., Professor, The University of Chicago


The Neurobiology of Language Planning Group:
Michael Arbib, Ph.D., University of Southern California, USA
Jeffrey Binder, M.D., Medical College of Wisconsin, USA
Vincent L. Gracco, Ph.D., McGill University, Canada
Yosef Grodzinsky, Ph.D., McGill University, Canada
Murray Grossman, M.D., Ed.D., University of Pennsylvania, USA
Peter Hagoort, Ph.D., Max Planck Institute, Netherlands
Gregory Hickok, Ph.D., University of California, Irvine, USA
Marta Kutas, Ph.D., The University of California, San Diego, USA
Alec Marantz, Ph.D., New York University, USA
Howard Nusbaum, Ph.D., The University of Chicago, USA
Cathy Price, Ph.D., University College London, UK
David Poeppel, Ph.D., New York University, USA
Riitta Salmelin, Ph.D., Aalto University, Finland
Kunioshi Sakai, Ph.D., Tokyo University, Japan
Steven L. Small, Ph.D, M.D., The University of Chicago, USA
Sharon Thompson-Schill, Ph.D.University of Pennsylvania, USA
Pascale Tremblay, Ph.D., The University of Chicago, USA
Richard Wise, M.D., Ph.D, Imperial College, London, UK
Kate Watkins, Ph.D., University of Oxford, UK

Tuesday, June 15, 2010

Post doc/Ph.D. positions: Obleser lab, Leipzig

Several positions starting in January 2011 at The Max-Planck-Institute for Human Cognitive and Brain Sciences (MPI–CBS) in Leipzig. Ad is here.

Monday, June 14, 2010

Weak quantitative standards in linguistics research? The Debate between Gibson/Fedorenko & Sprouse/Almeida

The following is an exchange regarding the nature of linguistic data between Ted Gibson and Evelina Fedorenko in one corner and Jon Sprouse and Diogo Almeida in the other. The exchange was sparked by (i) the Gibson & Fedorenko TICS paper and (ii) the unpublished response by Jon Sprouse and Diogo Almeida to that paper. A preview commentary on the issue is provided by David here. The exchange below took place over several days via email. Those involved have allowed me to post it here. I've deleted the previous separate posts that contained bits of this debate. In a few days I'll post a poll to see what people think. Enjoy...

TED:
You note that one particular comparison from the linguistics / syntax literature gives a stronger effect than one particular comparison comparison from the psycholinguistics literature (Gibson & Thomas, 1999). From this observation you conclude that the effects that linguists are interested in are larger than the effects that psycholinguists are interested in.

This is fallacious reasoning. You sampled one example from each of two literatures, and concluded that the literatures are interested in different effect sizes. You need to do a large random sample from each to make the conclusion you make.

Note that it is a tautology to show that you can find two comparisons with different effect sizes: this on its own doesn't demonstrate anything. I can show you the opposite effect-size comparison by selecting different comparisons. For example:

"Syntax" comparison: 2wh vs. 3wh, where 2wh is standardly assumed to be worse than 3wh.

1. a. 2wh: Peter was trying to remember what who carried.
3wh: b. Peter was trying to remember what who carried when.

"Psycholinguistics", where center-embedded is standardly assumed to be worse than right-branching:
2. a. Center-embedded: The ancient manuscript that the graduate student who the new card catalog had confused a great deal was studying in the library was missing a page.
Right-branching: b. The new card catalog had confused the graduate student a great deal who was studying the ancient manuscript in the library which was missing a page.

Clearly the effect size in the comparison in (2) is going to be much higher than in (1). I don't think we want to make the opposite conclusion that you made in your paper.

Indeed the 3wh vs. 2wh comparison (a "syntax" question) is such a small effect as not to even be measurable (which is the point of Clifton et al (2006) and Fedorenko & Gibson (2010)). This is contrary to what has been assumed in the syntax literature (and which was the actual point of our TiCS letter).



JON:
Hi Ted,

Thanks for the comments. It is interesting to note that your comments apply equally well to your own TiCS letter and the longer manuscript that it advertises. I am sure there is a more diplomatic way to do this, but in the interest of brevity, I am going to use your own words to make the point:

You note that one particular comparison from the linguistics/syntax literature is difficult to replicate with naive subjects in a survey experiment. From this observation you conclude that the effects that linguists report are suspicious and that the resulting theory is unsound.

This is fallacious reasoning. You sampled one example from a paper that has 70-odd data points in it (Kayne 1983), and a literature that has thousands, and concluded that this one replication failure means the literature is suspect. You need to do a large random sample from the literature to make the conclusion that you make.

Note that it is a tautology to show that you can find replication failures: this on its own doesn't demonstrate anything. I can show you many such replication failures in all domains of cognitive science. These are never interpreted as a death-knell for a theory or a methodology, so why is this one replication failure such a big problem for linguistic theory and linguistic methodology.

For the record, the point of our letter was to be constructive -- we were trying to figure out how it is that you could claim so much from a single replication failure, especially given that several researchers have reported running hundreds of constructions in quantitative experiments (e.g., Sam Featherston, Colin Phillips) that corroborate linguists' informal judgments. I don't really care if the effect sizes of the two literatures are always of a different magnitude or not (indeed, it is theories, not effect sizes, that determine the importance of an effect). What I do care about is your claim that a single replication failure is more important than the hundreds of (unpublishable!) replications that we've found. Linguists are serious people, and we take these empirical questions seriously... but we haven't found any evidence of a serious problem with linguistic data.

My guess is that you, like many of us on this email, think that there are some challenges facing linguistic theory, especially with how to integrate it with real-time processing theories. Unfortunately, these problems are not the result of bad data (which would be an easy fix). The problem is that the science is hard: complex representational theories are difficult to integrate with real-time processing theories -- and that can't be resolved by attaching numbers to judgments.

-jon


TED & EV:
[Sprouse quote:]"This is fallacious reasoning. You sampled one example from a paper that has 70-odd data points in it (Kayne 1983), and a literature that has thousands, and concluded that this one replication failure means the literature is suspect. You need to do a large random sample from the literature to make the conclusion that you make.

Note that it is a tautology to show that you can find replication failures: this on its own doesn't demonstrate anything. I can show you many such replication failures in all domains of cognitive science. These are never interpreted as a death-knell for a theory or a methodology, so why is this one replication failure such a big problem for linguistic theory and linguistic methodology."[End Sprouse quote]


We think it is misleading to refer to quantitative evaluations of claims from the syntax literature as "replications" or "replication failures". A replication presupposes the existence of a prior quantitative evaluation (an experiment, a corpus result, etc., i.e. data evaluating the theoretical claim). The claims from the syntax/semantics literature have mostly not been evaluated in a rigorous way. So it's misleading to talk about a "replication failure" when there was only an observation to start with.

In the cases that we allude to in the TiCS letter and in another, longer, paper ("The need for quantitative methods in syntax", currently in submission at another journal; an in-revision version is available here, the quantitative experiments that have been performed don't support the claimed pattern in the original papers. The concern is that there are probably many more such cases in the literature, which makes interpreting the theoretical claims difficult.

Second, we didn't find just one example. We have documented several, most that others have observed. Please see our longer paper. We are sure that there are many more.

In any case, we are not arguing that all or most judgments from the literature are incorrect. Suppose that 90% of the judgments are correct or are on the right track. The problem is in knowing which 90% to build theories on. If there are 70 relevant examples in a paper (as in the example paper that was suggested by Sprouse) that means that approximately 63 are correct. But which 63? 70 choose 63 is 1.2 billion. That's a lot of potentially different theories. To be rigorous, why not do the experiments? As we observe in the longer article, it's not hard to do the experiments anymore, especially in English, with the arrival of Amazon's Mechanical Turk. (In addition, many new large corpora are now available – from different languages – that can be used to evaluate hypotheses about syntax and semantics.)

[Sprouse quote:]"What I do care about is your claim that a single replication failure is more important than the hundreds of (unpublishable!) replications that we've found. Linguists are serious people, and we take these empirical questions seriously... but we haven't found any evidence of a serious problem with linguistic data."


Aside from the issue with the use of the term “replication” in this context (as pointed out above), our experience in evaluating claims from the syntax / semantics literature is different from Sprouse's. When we run quantitative experiments evaluating claims from the syntax / semantics literature, we don't typically find exactly the patterns of judgments of the researchers who first made the claim. The results of experiments are almost always more subtle, such that we gain much information (such as effect sizes across different comparisons, relative patterning of different constructions, variability in the population, etc.) from doing the quantitative experiment.

[Sprouse quote:]"My guess is that you, like many of us on this email, think that there are some challenges facing linguistic theory, especially with how to integrate it with real-time processing theories. Unfortunately, these problems are not the result of bad data (which would be an easy fix). The problem is that the science is hard: complex representational theories are difficult to integrate with real-time processing theories -- and that can't be resolved by attaching numbers to judgments."


We never claimed that doing quantitative experiments would solve every interesting linguistic question. But we do think that it is a prerequisite, and that doing quantitative experiments will solve some problems. So we don't see the downside of more rigor in these fields.

Egregiously yours,

Ted Gibson & Ev Fedorenko


DIOGO:
Hi Ted, Hi Ev (and hi everyone else)

Thanks for the comments on our unpublished letter, and for pointing us to your longer article under review.

Jon has already touched upon most of the issues I was going to bring up. However, there is still at least one important point that I would like to raise here in response to some of your comments on your last e-mail, which are also made in your TICS letter and the longer manuscript you provided us with. Namely, I think you profoundly mischaracterize the way linguists work when you say things like:

"the prevalent method in these fields involves evaluating a single sentence/meaning pair, typically an acceptability judgment performed by just the author of the paper, possibly supplemented by an informal poll of colleagues." (from TICS letter)

"...syntax research, where there is typically a single experimental trial." (from Manuscript, p. 7)

"The claims from the syntax/semantics literature have mostly not been evaluated in a rigorous way. So it's misleading to talk about a "replication failure" when there was only an observation to start with." (from last e-mail)


Nothing could be further from the truth. It is simply inaccurate to claim that linguists have lower methodological standards than other cognitive scientists simply because linguists do not routinely run formal acceptability judgments. Linguists test their theories in the exact same way other scientists do: By running experiments for which they (a) carefully construct relevant materials, (b) collect and examine the resulting data, (c) seek systematic replication and (d) present the results to the scrutiny of their peers. There is no "extra" rigour that comes from being able to run inferential statistics beyond what you get from thoughtfully evaluating theories, and systematically investigating the data that bear upon them (in the case of linguistics, through repeated single subject informal experiments that any native speaker can run).

When linguists evaluate contrasts between two (or more) sentence types, they normally run several different examples in their heads, they look for potential confounds, and consult other colleagues (and sometimes naive participants), who evaluate the sentence types in the same fashion. The fact that this whole set of procedures (aka, experiments) is conducted informally does not mean it is not conducted carefully and systematically. I cannot stress this enough: The notion that (a) linguists routinely test their theories with only one specific pair of tokens at a time, (b) proceed to publish papers based on the evaluation of this single data point, and (c) that the results of this single subject/token experiment receives no serious or systematic scrutiny by other linguists, is entirely without basis in reality (eg, see Marantz 2005 and Culicover and Jackendoff's response to your TICS letter).

The only difference between linguists and other scientists is that in order to evaluate the internal validity of their experiments (and again, they are experiments) linguists tend not to rely on inferential statistic methods. One of the possible reasons for that is that linguists normally look at contrasts that are fairly large, and it does not take many trials to be confident about one's own intuition in these cases. Incidentally, in these cases it does not take many trials for the stats to concurr either: if the linguist was running a sign test, 5 trials all going in the same direction would already guarantee statistically significant results at the .05 level (linguists routinely evaluate more tokens than that, btw).

But what happens in the case where the hypothesized contrast is not that obvious? In these cases, linguists would do what any scientist does when confronted with unclear results: they would try to replicate the informal experiment (eg, by asking colleagues/naive subjects to evaluate instances of contrasts of the relevant type), or would seek alternative ways of testing the same question (eg, by running a formal acceptability judgment survey). It is understandable why linguists have historically preferred to take the first course of action: Replicating informal experiments is faster and cheaper, and systematic replication (the gold standard of scientific experimentation) provides the basis for the external validity of the results.

Sincerely,
Diogo

ps: I also think you are overstating the case that formal acceptability judgment experiments routinely reach different conclusions from established contrasts in the linguistic literature and you are overinterpreting the implications of the handful of replication failures that you cited. I won't go into detail here in the interest of brevity, but I would be happy to share my thoughts in a future e-mail if you are interested.


TED:
Dear Diogo:

Thanks for your thoughtful response to my earlier emails. Let me jump right to the point:

You said:
Linguists test their theories in the exact same way other scientists do: By running experiments for which they (a) carefully construct relevant materials, (b) collect and examine the resulting data, (c) seek systematic replication and (d) present the results to the scrutiny of their peers. There is no "extra" rigour that comes from being able to run inferential statistics beyond what you get from thoughtfully evaluating theories, and systematically investigating the data that bear upon them (in the case of linguistics, through repeated single subject informal experiments that any native speaker can run).

... The fact that this whole set of procedures (aka, experiments) is conducted informally does not mean it is not conducted carefully and systematically.


I am sorry to be so blunt, but this is just incorrect. There *is* extra rigor from (a) constructing multiple examples of the targeted phenomena, which are normed for nuisance factors; and (b) evaluating the materials on a naive experimental population.

Both points are very important, but the second point is one that I have found many language researchers underestimate. The problem with not evaluating your hypotheses on a naive population (with distractor materials etc) is that there are unconscious cognitive biases on the part of the researchers and their friends which make their judgments on the materials close to worthless. (That sounds harsh, but unfortunately, it's true.) I know this first-hand. As we document in the longer paper (see pp 16-20), if you read my PhD thesis, there are many cases of judgments that turned out to not be correct, probably because of cognitive biases on my part and on the part of the people that I asked. We provided one example of such an incorrect judgment from my thesis: it was argued that doubly nested relative clause structures are more complicated to understand when they modify a subject (2) than when they modify an object (3) (Gibson, 1991, examples (342b) and (351b) from pp. 145-147):

(1) The man that the woman that the dog bit likes eats fish.
(2) I saw the man that the woman that the dog bit likes.

That is, (1) was argued to be harder to process than (2). In doing this research, I asked lots of people and they all agreed. And I constructed various similar versions. The people that I asked pretty much uniformly agreed that (1) was worse than (2).

But if you do the experiment, with naive subjects, and lots of fillers etc, it turns out that there is no such effect. I ran that comparison about 5 times, and never found any difference. Both are rated as not very acceptable (relative to lots of other things) but there was never a difference in the predicted direction between these two structures.

The problem here was very likely a cognitive bias. I had a theory which predicted the difference, and all my informants had a similar theory (it's basically that more nesting leads to harder processing, as suggested by Miller & Chomsky (1963) and Chomsky & Miller (1963)). So we used that theory to get the judgment predicted by that theory.

If you read the literature on cognitive biases, this is a standard effect. To quote from our longer paper:

"In Evans Barston & Pollard's experiments (1983; cf. other kinds of confirmation bias, such as first demonstrated by Wason, 1960; see Nickerson, 1998, for an overview of many similar kinds of cognitive biases) experiments, people were asked to judge the acceptability of a logical argument. Although the experimental participants were sometimes able to use logical operations in judging the acceptability of the arguments that were presented, they were most affected by their knowledge of the plausibility of the consequents of the arguments in the real world, independent of the soundness of the argumentation. They thus often made judgment errors, because they were unconsciously biased by the real-world likelihood of events.

More generally, when people are asked to judge the acceptability of a linguistic example or argument, they seem unable to ignore potentially relevant information sources, such as world knowledge, or theoretical hypotheses. For example, understanding a theoretical hypothesis whereby structures with more open linguistic dependencies are more complex than those with fewer may lead an experimental participant to judge examples with more open dependencies as more complex ..." as in the examples discussed above.

One of the main points of our papers is that it's really not enough to just be careful and think hard. That is just not rigorous enough to avoid the affects of unconscious cognitive biases. In order to be rigorous, you really need some quantitative evaluation that comes from the analysis of naive subjects. Either corpus analysis or controlled experiment.

You said:
The only difference between linguists and other scientists is that in order to evaluate the internal validity of their experiments (and again, they are experiments) linguists tend not to rely on inferential statistic methods. One of the possible reasons for that is that linguists normally look at contrasts that are fairly large, and it does not take many trials to be confident about one's own intuition in these cases. Incidentally, in these cases it does not take many trials for the stats to concurr either: if the linguist was running a sign test, 5 trials all going in the same direction would already guarantee statistically significant results at the .05 level (linguists routinely evaluate more tokens than that, btw).


The point that I made in response to Jon's earlier comments along these lines still holds. If you want to claim that linguists tend to examine effect sizes that are larger than the effect sizes that psycholinguists examine, then you need to show this. You can't just state it and expect others to accept your hypothesis. Personally, I highly doubt that it's true. I have read hundreds of syntax / semantics papers, and in most of them, there are lots of questionable judgments, which are probably comparisons with small effect sizes, or non-effects.

Ted Gibson


DIOGO & JON:
Dear Ted,

Thank you for your response. This is a joint reply by Jon and me.

[Gibson quote:] Let me jump right to the point: "I am sorry to be so blunt, but this is just incorrect. There *is* extra rigor from (a) constructing multiple examples of the targeted phenomena, which are normed for nuisance factors;"


We totally agree with the need for multiple items. In fact, we just told you that linguists routinely evaluate several instances of any proposed sentence type contrast. On this point, there is simply no difference between what linguists and psycholinguists do (see Marantz 2005).

What we completely disagree with you is about the priority you assign to the results from naive participants. There is no particular reason to assign your average pool of 30-odd college-aged students the role of arbiter of truth. Just finding a difference between what a researcher thinks is going to happen and the experimental results from a pool of naive subjects is not particularly informative, especially if they are of the "failure to replicate" type. There are several reasons why one might get an unexpected null result that have nothing to do with "cognitive bias":

(1) The experiment is underpowered

For instance, in Gibson and Thomas (1999), you claim that, contrary to the initial motivating intuition, you did not find that (b) was rated better than (a):

a. *The ancient manuscript that the graduate student who the new card catalog had confused a great deal was studying in the library was missing a page.

b. ?The ancient manuscript that the graduate student who the new card catalog had confused a great deal was missing a page.

In our unpublished letter (see figure), we show that this is most likely an issue of power, because the effect is definitely there (it's just small and requires a large sample to have a moderate chance of being detected).

(2) There are problems with the experimental design, such as:

(i) The experiment uses a task that is not necessarily sensitive to the manipulation

For instance, why would you necessarily think that acceptability tasks should be equally sensitive to all processing difficulties? It could be the case that acceptability judgment might not be the right dependent measure to use.

(ii) The experiment uses a design or task that is not optimal to reveal the effect of interest

Sprouse & Cunningham (under review, p. 23, figure 8 sent attached) have data showing that the contrast between (a) and (b) above can be detected with a sample half the size Gibson & Thomas (1999) used when one uses a magnitude estimation task with lower acceptability reference sentences (but not at all when higher acceptability reference sentences are used).

None of these explanations invoke cognitive biases. We don't necessarily disagree that cognitive biases are a potential problem. We just think that before you invoke it as an explanation (1) you need positive evidence and (2) a failure to replicate the results from an informal experiment in a formal experiment is not positive evidence. In fact, had you assigned the kind of priority to formal experimental results with naive participants you seem to advocate in your previous e-mail, you would have been misled by the Gibson & Thomas (1999) data, and would have concluded the contrast is not real. In fact you yourself explained the result by appealing to the offline nature of the test, and not to cognitive biases, so why would should cognitive bias be the null hypothesis for linguistics?

Furthermore, we can also find exactly the opposite pattern: a significant experimental effect that confirms the initial expectations that is nonetheless considered by the experimenter as being evidence against them. Take the Wasow & Arnold (2005) paper for example. In your longer manuscript you say this:

"Wasow & Arnold (2005) discuss three additional cases: the first having to do with extraction from datives; the second having to do with the syntactic flexibility of idioms; and the third having to do with acceptability of certain heavy-NP shift items that were discussed in Chomsky (1955 / 1975). The first of these seems particularly relevant to Phillips’ claim. In this example of faulty judgment data, Filmore (1965) states that sentences like (1), in which the first object in a double object construction is questioned, are ungrammatical:

(1) Who did you give this book?

Langendoen, Kalish-Landon & Dore (1973) tested this hypothesis in two experiments, and found many participants (“at least one-fifth”) who accepted these kinds of sentences as completely grammatical. Wasow & Arnold note that this result has had little impact on the syntax literature." (pp. 13-4)

And it shouldn't. If only one fifth of the sample in Langendoen et al. (1973) failed to show the expected contrast, the results are not problematic at all. In fact, they are actually highly signifcant, and overwhelmingly support the original proposal: A simple one-tailed sign test here would give you a p-value of 1.752e-09 and a 95% CI for the probability of finding the result in the predicted direction of (0.7-1)). Let me stress this again: what the experiment is actually telling you is that the results support the linguist's informal experiments, not the contrary, as Wasow & Arnold seem to think.

The same is true of the Wasow & Arnold (2005)'s own acceptability experiment. They decided to test an intuition from Chomsky (1955) about the position of verb particles interacting with the complexity of object NP. They tested the following paradigm, where the object in (a-b) is thought to be more complex than the object of (c-d).

a. The children took everything we said in. (1.8)
b. The children took in everything we said. (3.3)
c. The children took all our instructions in. (2.8)
d. The children took in all our instructions. (3.4)

According to Chomsky, (c) sounded more natural than (a), and (b) and (d) should be equally acceptable. And that is precisely what Wasow & Arnold (2005) found (see mean acceptability in 4 point scale next to each condition above), with highly significant results. These results were also replicated in another of their conditions, omitted here for brevity's sake. And yet, Wasow & Arnold (2005) claim the following:

"there was considerable variation in the responses. In particular, although the mean score for split examples with complex NPs was far lower than any of the others [ie, The results support Chomsky's intuitions], 17% of the responses to such sentences were scores of 3 or 4. That is, about one-sixth of the time, participants judged such examples to be no worse than awkward."

Again, the results were highly significant and support rather than undermine the original intuition from the linguist, and yet Wasow & Arnold (and, given the quote from you article, you too) seem to conclude the opposite from the experimental data presented.

So where is this extra rigor that one gets by simply running formal acceptability judgments? It just seems to us that simply running a formal acceptability experiment with naive participants does not preclude one from being misled by one's results anymore than what happens in the case of informal experiments.

Sincerely,
Diogo & Jon


TED & EV:
Dear Diogo & Jon:

The point is *not* that quantitative evidence will solve all questions in language research. The point is just that having quantitative data is a necessary but not sufficient condition. That's all.

Without the quantitative evidence you just have a researcher's potentially biased judgment. I don't think that that's good enough. It's not very hard to do an experiment to evaluate one's research question, so one should do the experiment. One is *never* worse off after doing experiment. You might find that the issue is complex and harder to address than you thought before doing the experiment. But even that is useful information.

I don't have anything more to say on this for now. Some day, I would be happy to debate you in a public forum if you like.

Best wishes,

Ted (& Ev)


DIOGO:
Dear Ted,

Let me just add a few remarks to your last e-mail, and then I don't think I have anything more to say on the matter either. Thanks for engaging with us in this discussion.

[Gibson quote:] "The point is *not* that quantitative evidence will solve all questions in language research. The point is just that having quantitative data is a necessary but not sufficient condition. That's all."


And the point Jon and I are trying to make is that having quantitative data for linguistic research, while potentially useful, is not always necessary. The implication of your claim is also far from uncontroversial: it implies that linguistics, where quantitative methods are not widely used, fails to live to a "necessary" scientific standard. We think this is both false and misguided.

[Gibson quote:] "Without the quantitative evidence you just have a researcher's potentially biased judgment. I don't think that that's good enough."


Here's the thing: a published judgment contrast in the linguistic literature, especially if it is a theoretically important one, has been replicated hundreds of times in informal experiments. When the contrast is uncontroversial, it will keep being replicated nicely and will attract no further attention. However, when the contrast is a little shaky, linguists are keenly aware of it, and weigh the theory it supports (or rejects) accordingly. Finally, when the contrast is not really replicable, it is actually challenged, because that is the one thing linguists do: they try to test their theories, and if some part of the theory is empirically weak, it will be challenged. I highly doubt that cognitive biases could play any significant role in this systematic replication process.

Now, here's where this methodology is potentially problematic: If there is a judgment contrast from a language for which there are very few professional linguists that are also native speakers and for which access to naive native speakers is limited. In this case, it is possible that a published judgment contrast will go unreplicated, and if faulty, could lead to unsound conclusions. In these cases, I totally agree that having quantitative data is probably necessary. But note that the problem here is not the lack of quantitative data to begin with, the problem is with the lack of systematic replication. Quantitative data only serves as a way around this problem.

[Gibson quote:] "It's not very hard to do an experiment to evaluate one's research question, so one should do the experiment."


The point is that linguists DO the experiment. They just do it informally.

[Gibson quote:] "One is *never* worse off after doing experiment. You might find that the issue is complex and harder to address than you thought before doing the experiment. But even that is useful information."


The question is not whether or not one is worse off after doing the formal experiment. The question is whether or not one is necessarily better off.

There is a very clear cost in running a formal experiment versus an informal experiment. Formal experiments with naive participants take time (IRB approval, advertising on campus, having subjects come to the lab and taking the survey, or setting some web interface so they could do it from home, etc), and potentially money (if you don't have a volunteer subject pool, or if you use things like Amazon's Mechanical Turk). If you want linguists to adopt this mode of inquiry as "necessary", you have to show them that they would be better off doing so. That is the part where it is really not clear that they would.

You can try to show this in two ways: You can show linguists that (1) they get interesting, previously unavailable data or (2) show them that they are being misled by their informal data gathering methods and running the formal experiment really does fix that. Because otherwise, what is the point? If linguists just confirm over and over again that they keep getting the same results running naive participants as they get with their informal methods (and this is what linguists like Jon, Sam Featherston, Colin Phillips and others keep telling you happens), then why should they bother going through a much slower, and much more costly method that does not give them any more information than their quick, informal, but highly replicable method does?

Best wishes,
Diogo

Friday, June 11, 2010

POSTDOCTORAL POSITION – MEG/EEG - PARIS

Applications are invited for a postdoc position supervised by Ghislaine Dehaene-Lambertz to work on consciousness in infants using EEG. The team is part of INSERM’s ‘Cognitive Neuroimaging Unit' (http://www.unicog.org, director : Stanislas Dehaene) at NeuroSpin (director : Denis LeBihan) in the greater Paris region. NeuroSpin is a newly opened outstanding interdisciplinary research environment that houses several research laboratories and combines expertise in cognitive neuroscience and neuropsychology, magneto-electrophysiology, high field MR imaging and imaging data analysis.

The project is part of an European community project to study consciousness in adults, monkeys, infants and comatose patients. The postdoc will program and analyse EEG experiments (subliminal presentation, stimulus collision, etc..) in infants and discuss the results with the other teams involved. Applicants should have a PhD degree in Neuroscience, Medicine, Psychology, or related areas.

Prior experience with skills in EEG/MEG analysis is necessary. Salary will be commensurate with experience within the salary scale of the French public research organisations (~2100 euros per month). The position is funded for one to five years, and should be started during autumn 2010. Applications will be considered until the position is filled.

For further information or to submit an application (including the names of two referees) please contact Ghislaine Dehaene-Lambertz, email: ghislaine.dehaene@cea.fr

Wednesday, June 9, 2010

Journal Scan -- June 2010

A few interesting articles --

Inferior Frontal Gyrus Activation Predicts Individual Differences in Perceptual Learning of Cochlear-Implant Simulations
Frank Eisner, Carolyn McGettigan, Andrew Faulkner, Stuart Rosen, and Sophie K. Scott
J. Neurosci. 2010;30 7179-7186

Drivers and modulators in the central auditory pathways
Charles C. Lee and S. M. Sherman
Frontiers in Human Neuroscience

Mechanisms of song perception in oscine birds
Daniel P. Knudsen, Timothy Q. Gentner
Brain and Language

Direct Recordings of Pitch Responses from Human Auditory Cortex
Timothy D. Griffiths, Sukhbinder Kumar, William Sedley, Kirill V. Nourski, Hiroto Kawasaki, Hiroyuki Oya, Roy D. Patterson, John F. Brugge, Matthew A. Howard
Current Biology

Cortical Spatio-temporal Dynamics Underlying Phonological Target Detection in Humans
Edward F. Chang, Erik Edwards, Srikantan S. Nagarajan, Noa Fogelson, Sarang S. Dalal, Ryan T. Canolty, Heidi E. Kirsch, Nicholas M. Barbaro, Robert T. Knight
Journal of Cognitive Neuroscience

Tuesday, June 8, 2010

Donostia Workshop on Neurobilingualism

We are pleased to announce that the ON-LINE REGISTRATION for the Donostia Workshop on Neurobilingualism, to be held in Donostia – San Sebastián (Spain), is NOW OPEN.

Attendants are invited to register by the following web: www.bcbl.eu/events/neurobilingualism

Important dates to remember:

- ABSTRACT SUBMISSION DEADLINE: June 15th, 2010
- NOTIFICATION OF ABSTRACT ACCEPTANCE: June 30th, 2010
- EARLY REGISTRATION DEADLINE: July 15th, 2010
- CONFERENCE DATES: Sept. 30th - Oct. 2nd 2010

INVITED SPEAKERS

* Laura-Ann Petitto. University of Toronto, Canada
* Agnes Kovacs. Hungarian Academy of Sciences, Hungary.
* Michael Dorman. Arizona State University, USA.
* Jonathan Grainger. CNRS and University of Provence, France.
* Douglas Davidson. BCBL, Spain.
* Nuria Sebastian. Universitat Pompeu Fabra, Spain.

DISCUSSANTS

* Guillaume Thierry. Bangor University. UK.
* Nuria Sebastian. Universitat Pompeu Fabra, Spain


It would be very much appreciated if you could circulate this info to anyone you consider would be interested.

Thursday, June 3, 2010

Back to the future on syntax and Broca's area

Thirty-five years ago Caramazza and Zurif (1976) made a startling claim that literally changed the way the field thought about Broca's aphasia, Broca's area, and the neurology of syntax. Up until that time Broca's area was thought to be basically a motor speech area. Even the agrammatic speech output of Broca's aphasics was thought, by prominent researchers, to reflect not a syntactic deficit but an economy of effort induced by the difficulty of articulating speech. In this context, Caramazza and Zurif showed that Broca's aphasics failed to comprehend sentences that required syntactic analysis. (Footnote: the same was true of conduction aphasics, but no one remembers that fact.)

Based on their findings C&Z made the following claim:

...for the Broca’s aphasics, brain damage affects a general language processing mechanism that subserves the syntactic component of both comprehension and production. The implication that follows is that the anterior language area of the brain is necessary for syntactic-like cognitive operations. (p. 581)


State of the art, 1976: Broca's area is the seat of syntax.

This, of course, was a beautiful study that has since been replicated repeatedly. Their conclusions were perfectly reasonable, except they turned out to be wrong. Subsequent work (e.g., Linebarger, Schwartz, & Saffran, 1983) showed that Broca's aphasics had not entirely lost their syntax -- they could still make grammaticality judgments pretty darn well.

State of the art, 1983: Broca's area is not the seat of syntax.

The field reacted with a variety of new ideas about Broca's area: It supports only a restricted component of syntax (Grodzinsky), it supports the mapping between syntax and meaning (Schwartz and Saffran), it supports fast access to lexical information (Zurif and others).

State of the art, after 1983: Who knows what Broca's area is doing, but we all agree: it is not the seat of (all of) syntax.

Now fast forward to today and scan the literature on the role of Broca's area in syntax. You might come across a paper by Fadiga et al. (2009) which states:

we propose that Broca's area might be a center of a brain network encoding hierarchical structure regardless of their use in action, language, or music. (p. 455)


Similar claims have been made by Friederici and colleagues who have suggested a role for Broca's area in hierarchical structure processing and phrase structure building.

State of the art [?!], 2009: Broca's area is the seat of syntax via its more general role [?!] in hierarchical processing of any kind.

What happened between 1983 and 2009 to cause the regression back to the interesting, but ultimately incorrect claim of C&Z? Functional imaging happened. It seems that when functional imaging became a widespread tool, with the development of fMRI in particular in the 1990s, the field either forgot about the decades of good research that came before, or just decided to start over. This is a mistake.

New rule: If you want to claim Broca's area is the seat of syntax, please (i) cite Caramazza & Zurif, and (ii) provide an explanation for the Linebarger et al. results.

References

Caramazza A, Zurif EB. 1976. Dissociation of algorithmic and heuristic processes in sentence comprehension: Evidence from aphasia. Brain and Language. 3:572-582.

Fadiga, L., Craighero, L., & D’Ausilio, A. (2009). Broca's Area in Language, Action, and Music Annals of the New York Academy of Sciences, 1169 (1), 448-458 DOI: 10.1111/j.1749-6632.2009.04582.x

Friederici AD. 2009. Pathways to language: fiber tracts in the human brain. Trends Cogn Sci. 13:175-181.

Friederici AD, Bahlmann J, Heim S, Schubotz RI, Anwander A. 2006. The brain differentiates human and non-human grammars: functional localization and structural connectivity. Proc Natl Acad Sci U S A. 103:2458-2463.

Linebarger MC, Schwartz M, Saffran E. 1983. Sensitivity to grammatical structure in so-called agrammatic aphasics. Cognition. 13:361-393.

Tuesday, June 1, 2010

An egregious act of methodological imperialism

In an 'Update' in a recent issue of TICS (Weak quantitative standards in linguistics research. 10.1016/j.tics.2010.03.005), Gibson & Fedorenko (GF) commit an egregious act of methodological imperialism, and an unwarranted one, at that.

GF complain that one key source of data for theoretical linguistics (and particularly for syntax and semantics research), acceptability or grammaticality judgments, are not "quantitative." They advocate for some intuitive standard of what it means to do the 'right kind' of quantitative work, arguing that "multiple items and multiple naive experimental participants should be evaluated in testing research questions in syntax/semantics, which therefore require the use of quantitative analysis methods." They contend that the "lack of validity of the standard linguistic methodology has led to many cases in the literature where questionable judgments have led to incorrect generalizations and unsound theorizing." In a peculiar rhetorical twist, GF go on to highlight their worry: "the fact that this methodology is not valid has the unwelcome consequence that researchers with higher methodological standards will often ignore the current theories from the field of linguistics. This has the undesired effect that researchers in closely related fields are unaware of interesting hypotheses in syntax and semantics research."

Now, it's hardly new to express worries about grammaticality judgments. Why this is considered an 'Update' in a journal specializing in Trends is a bit mystifying - the topic has been revisited for decades (e.g. Spencer 1972, Clark 1973, and many thereafter), and is at best an 'Outdate.' And other than some animosity towards theoretical linguistics from Ted and Evelina, two established and productive MIT psycholinguists, it's not clear what trend is being thematized by the journal, other than the pretty banal point that in absolutely every domain of research there are, unfortunately, examples of bad research.

But do linguists really need to be told that there is observer bias? That experiments can be useful? That corpus analyses can yield additional data? I must say I found the school-marmish normativism very off-putting. Like all disciplines, linguistics relies on replication, convergent evidence (e.g. cross-linguistic validation), and indeed any source of information that elucidates the theoretical proposal being investigated. Some theories survive and are sharpened, others are invalidated. Is this different from any other field? GF seem to believe in a hierarchy of evidence and standards, and some unspecified sense of quantitative analysis is considered 'higher' and 'better.' Would they willing to extend that perspective to those of us who do neurobiological research? Are my data even better, because both quantitative and 'hard'? Not a conclusion we want to arrive at for cognitive neuroscience of language, I think.

Culicover & Jackendoff have published a response (Quantitative methods alone are not enough: Response to Gibson and Fedorenko. 10.1016/j.tics.2010.03.012) that tackles some of this. Their tone is pretty conciliatory, although they rightly point out that "theoreticians' subjective judgments are essential in formulating linguistic theories. It would cripple linguistic investigation if it were required that all judgments of ambiguity and grammaticality be subject to statistically rigorous experiments on naive subjects, especially when investigating languages whose speakers are hard to access. And corpus and experimental data are not inherently superior to subjective judgments." Their points are cogently made -- but it's hardly a spirited response. Their meta-commentary is too vanilla and of the "why can't we all be friends?" flavor.

On the other hand... I just read a very clever and appropriately aggressive and quantitative response to GF that I wish TICS had published. It's my understanding that TICS had a chance to look at this response and I am baffled that they didn't publish this actually innovative and insightful commentary. It is by Jon Sprouse and Diogo Almeida (SA) at UC Irvine (The data say otherwise. A response to Gibson and Fedorenko.) SA analyzed the data from more than 170 naïve participants rendering judgments on two types of phenomena that make frequent appearances in linguistics and psycholinguistics (wh-islands and center-embedding). Using a quantitative (resampling) analysis they illustrate how many judgments and how many contrasts one needs to obtain a significant result given the effect sizes of these sorts of studies. Compellingly, they show that for the kind of phenomena that are being investigated, vastly different numbers of subjects and contrasts are necessary to achieve a convincing result. The kinds of contrasts and phenomena that linguists tend to be worried about are clearly evident with very few data points; in contrast, surprisingly large data sets are necessary to achieve a satisfactory result for psycholinguistic phenomena. They conclude, in my view quite correctly, that the only thing that can be concluded is that the objects of study are simply quite different for linguistics and psycholinguistics. There may be controversy, but there is no issue ...

Readers should form their own opinions on this issue, but I urge them to look at this trio of brief commentaries.


Gibson, E., & Fedorenko, E. (2010). Weak quantitative standards in linguistics research Trends in Cognitive Sciences DOI: 10.1016/j.tics.2010.03.005

NLC2010: Conference site announced, extended abstract submission deadline

Please note that abstract submissions for the Second Annual Neurobiology of Language Conference will be accepted until Friday, June 4th, at midnight CST (extended deadline!). For guidelines about abstract submission, or to submit an abstract, visit our website!

We are glad to announce that the Second Annual Neurobiology of Language Conference (NLC 2010) will be held at the beautiful southern California style Rancho Bernardo Inn Golf Resort & Spa. As a guest at the Rancho Bernardo Inn you will enjoy a complete Resort-Style experience: luxury accommodations, award-winning dining, championship golf, resort spa, etc. In addition, Rancho Bernardo Inn is located just minutes away from many of San Diego’s Top Attractions: Wild Animal Park, Legoland, SeaWorld, San Diego Zoo and Southern California Beaches.

In order to make your NLC 2010 experience a memorable one, we are delighted to offer an unbeatable room rate for single or double occupancy of US $170, complimentary kids Club for all attendees staying at the Rancho, free transportation back to downtown San Diego on Saturday, November 13, 30% discount on published golf prices (good for individual or tournament play and includes rental cart), a 10% discount on spa services, and complimentary self parking.

The group rate is available throughout the conference, (i.e. November 10-12), and can be extended 3 days prior and after the conference. To reserve a room online, visit the Rancho's website at and use the following code: 1011ANNUA. Please note that the online reservation system will only accept reservations that are within the Conference dates. To reserve additional dates at the group rate, call the Rancho’s reservation's team Monday-Friday from 7am-9pm, and Saturday and Sunday from 7am-7pm at 800-542-6096.

We strongly encourage you to reserve your room at the Rancho as soon as possible because space is limited!