A recent paper in the Journal of Neuroscience by Bonte, Valente, and Formisano (2009) reports the results of an EEG/ERP study in which listeners performed a one-back task on a sequence of vowels (/a/, /i/, /u/) from three different speakers (make and female; different tokens from each speaker).
What’s impressive – and a little bit intimidating – about this paper is that it tries to tackle a whole bunch of issues in one single experiment: the perceptual analysis of vowels, the abstract representation of vowels versus speaker identity, task-driven modulation of cortical responses, the role of oscillations in neural coding and perception … Pretty heady stuff. There is much to like about this paper, particularly the thorough and creative analysis of the electrophysiological data. Anyone using ERP or MEG to study speech will benefit from looking at all the analyses they used. The paper connects well to a recent imaging paper by Formisano and colleagues in Science (Science 7 November 2008 322: 817) -- in which, by the way, the same materials were used -- regarding the anatomic representation of speech versus speaker.
My favorite part of the article is the fact that it starts out with a nice model, a neural coding hypothesis regarding how the neurophysiological response profile will change as a function of the interaction between stimulus materials and task. Basically, executing a task (e.g. vowel identity) will realign in phase the response typically elicited by the stimulus, and the nitty-gritty of the realignment depends on the specifics of the task. This is the kind of model/hypothesis at the interface of systems neuroscience and cognitive neuroscience that I wish we saw more of in the literature. My second favorite part is the thoughtful discussion in which Bonte et al. link their study to the literature on speech, oscillations, top-down effects, and so on. My third favorite part is the fact that their data further support the view that analyzing response phase yields a tremendous amount of additional information, a view which I favor and which has received provocative support in the recent literature (see lots of stuff by Charlie Schroeder and colleagues, a 2007 paper by Luo & Poeppel, a brand new paper by Kayser et al in Neuron 2008.)
The technical tour-de-force notwithstanding, I do have some questions that require clarification. My major question concerns the proposed time line. Two conclusions are highlighted. One is that the initial acoustic-phonetic based analysis is reflected in the N1/P2 responses but that abstract representations are really only reflected late (300 ms and later). The data are the data, but I do find this conclusion surprising in light of the mismatch negativity (MMN; ERP) and mismatch field (MMF; MEG) studies that highlight access to abstract representations by the time the MMN peaks (say 150-200 ms). For example, various findings by Näätänen and colleagues (eg Nature, 1997) and data from Phillips and colleagues (eg J Cog Neurosci 2000; PNAS 2006) suggest abstraction ‘has happened’ by then. Is this different conclusion due to task differences? My second question has to do with how early top-down task-effects are revealed. Again, selective attention tasks reveal response amplitude modulation at the N1/N1m and even earlier (eg Woldorff). I fact, people working on brainstem responses such as Nina Kraus or Jack Gandour see early early early effects. So task differences make the difference there, too? I would like to understand this part better.
I do like the technical cleverness and experimental simplicity of this study, but I would like to get my head around the time line more.
M. Bonte, G. Valente, E. Formisano (2009). Dynamic and Task-Dependent Encoding of Speech and Voice by Phase Reorganization of Cortical Oscillations Journal of Neuroscience, 29 (6), 1699-1706 DOI: 10.1523/JNEUROSCI.3694-08.2009