In an 'Update' in a recent issue of TICS (Weak quantitative standards in linguistics research. 10.1016/j.tics.2010.03.005), Gibson & Fedorenko (GF) commit an egregious act of methodological imperialism, and an unwarranted one, at that.
GF complain that one key source of data for theoretical linguistics (and particularly for syntax and semantics research), acceptability or grammaticality judgments, are not "quantitative." They advocate for some intuitive standard of what it means to do the 'right kind' of quantitative work, arguing that "multiple items and multiple naive experimental participants should be evaluated in testing research questions in syntax/semantics, which therefore require the use of quantitative analysis methods." They contend that the "lack of validity of the standard linguistic methodology has led to many cases in the literature where questionable judgments have led to incorrect generalizations and unsound theorizing." In a peculiar rhetorical twist, GF go on to highlight their worry: "the fact that this methodology is not valid has the unwelcome consequence that researchers with higher methodological standards will often ignore the current theories from the field of linguistics. This has the undesired effect that researchers in closely related fields are unaware of interesting hypotheses in syntax and semantics research."
Now, it's hardly new to express worries about grammaticality judgments. Why this is considered an 'Update' in a journal specializing in Trends is a bit mystifying - the topic has been revisited for decades (e.g. Spencer 1972, Clark 1973, and many thereafter), and is at best an 'Outdate.' And other than some animosity towards theoretical linguistics from Ted and Evelina, two established and productive MIT psycholinguists, it's not clear what trend is being thematized by the journal, other than the pretty banal point that in absolutely every domain of research there are, unfortunately, examples of bad research.
But do linguists really need to be told that there is observer bias? That experiments can be useful? That corpus analyses can yield additional data? I must say I found the school-marmish normativism very off-putting. Like all disciplines, linguistics relies on replication, convergent evidence (e.g. cross-linguistic validation), and indeed any source of information that elucidates the theoretical proposal being investigated. Some theories survive and are sharpened, others are invalidated. Is this different from any other field? GF seem to believe in a hierarchy of evidence and standards, and some unspecified sense of quantitative analysis is considered 'higher' and 'better.' Would they willing to extend that perspective to those of us who do neurobiological research? Are my data even better, because both quantitative and 'hard'? Not a conclusion we want to arrive at for cognitive neuroscience of language, I think.
Culicover & Jackendoff have published a response (Quantitative methods alone are not enough: Response to Gibson and Fedorenko. 10.1016/j.tics.2010.03.012) that tackles some of this. Their tone is pretty conciliatory, although they rightly point out that "theoreticians' subjective judgments are essential in formulating linguistic theories. It would cripple linguistic investigation if it were required that all judgments of ambiguity and grammaticality be subject to statistically rigorous experiments on naive subjects, especially when investigating languages whose speakers are hard to access. And corpus and experimental data are not inherently superior to subjective judgments." Their points are cogently made -- but it's hardly a spirited response. Their meta-commentary is too vanilla and of the "why can't we all be friends?" flavor.
On the other hand... I just read a very clever and appropriately aggressive and quantitative response to GF that I wish TICS had published. It's my understanding that TICS had a chance to look at this response and I am baffled that they didn't publish this actually innovative and insightful commentary. It is by Jon Sprouse and Diogo Almeida (SA) at UC Irvine (The data say otherwise. A response to Gibson and Fedorenko.) SA analyzed the data from more than 170 naïve participants rendering judgments on two types of phenomena that make frequent appearances in linguistics and psycholinguistics (wh-islands and center-embedding). Using a quantitative (resampling) analysis they illustrate how many judgments and how many contrasts one needs to obtain a significant result given the effect sizes of these sorts of studies. Compellingly, they show that for the kind of phenomena that are being investigated, vastly different numbers of subjects and contrasts are necessary to achieve a convincing result. The kinds of contrasts and phenomena that linguists tend to be worried about are clearly evident with very few data points; in contrast, surprisingly large data sets are necessary to achieve a satisfactory result for psycholinguistic phenomena. They conclude, in my view quite correctly, that the only thing that can be concluded is that the objects of study are simply quite different for linguistics and psycholinguistics. There may be controversy, but there is no issue ...
Readers should form their own opinions on this issue, but I urge them to look at this trio of brief commentaries.
Gibson, E., & Fedorenko, E. (2010). Weak quantitative standards in linguistics research Trends in Cognitive Sciences DOI: 10.1016/j.tics.2010.03.005