Friday, March 6, 2009

The electrophysiology of everything -- about vowels …

A recent paper in the Journal of Neuroscience by Bonte, Valente, and Formisano (2009) reports the results of an EEG/ERP study in which listeners performed a one-back task on a sequence of vowels (/a/, /i/, /u/) from three different speakers (make and female; different tokens from each speaker).

What’s impressive – and a little bit intimidating – about this paper is that it tries to tackle a whole bunch of issues in one single experiment: the perceptual analysis of vowels, the abstract representation of vowels versus speaker identity, task-driven modulation of cortical responses, the role of oscillations in neural coding and perception … Pretty heady stuff. There is much to like about this paper, particularly the thorough and creative analysis of the electrophysiological data. Anyone using ERP or MEG to study speech will benefit from looking at all the analyses they used. The paper connects well to a recent imaging paper by Formisano and colleagues in Science (Science 7 November 2008 322: 817) -- in which, by the way, the same materials were used -- regarding the anatomic representation of speech versus speaker.

My favorite part of the article is the fact that it starts out with a nice model, a neural coding hypothesis regarding how the neurophysiological response profile will change as a function of the interaction between stimulus materials and task. Basically, executing a task (e.g. vowel identity) will realign in phase the response typically elicited by the stimulus, and the nitty-gritty of the realignment depends on the specifics of the task. This is the kind of model/hypothesis at the interface of systems neuroscience and cognitive neuroscience that I wish we saw more of in the literature. My second favorite part is the thoughtful discussion in which Bonte et al. link their study to the literature on speech, oscillations, top-down effects, and so on. My third favorite part is the fact that their data further support the view that analyzing response phase yields a tremendous amount of additional information, a view which I favor and which has received provocative support in the recent literature (see lots of stuff by Charlie Schroeder and colleagues, a 2007 paper by Luo & Poeppel, a brand new paper by Kayser et al in Neuron 2008.)

The technical tour-de-force notwithstanding, I do have some questions that require clarification. My major question concerns the proposed time line. Two conclusions are highlighted. One is that the initial acoustic-phonetic based analysis is reflected in the N1/P2 responses but that abstract representations are really only reflected late (300 ms and later). The data are the data, but I do find this conclusion surprising in light of the mismatch negativity (MMN; ERP) and mismatch field (MMF; MEG) studies that highlight access to abstract representations by the time the MMN peaks (say 150-200 ms). For example, various findings by Näätänen and colleagues (eg Nature, 1997) and data from Phillips and colleagues (eg J Cog Neurosci 2000; PNAS 2006) suggest abstraction ‘has happened’ by then. Is this different conclusion due to task differences? My second question has to do with how early top-down task-effects are revealed. Again, selective attention tasks reveal response amplitude modulation at the N1/N1m and even earlier (eg Woldorff). I fact, people working on brainstem responses such as Nina Kraus or Jack Gandour see early early early effects. So task differences make the difference there, too? I would like to understand this part better.

I do like the technical cleverness and experimental simplicity of this study, but I would like to get my head around the time line more.

M. Bonte, G. Valente, E. Formisano (2009). Dynamic and Task-Dependent Encoding of Speech and Voice by Phase Reorganization of Cortical Oscillations Journal of Neuroscience, 29 (6), 1699-1706 DOI: 10.1523/JNEUROSCI.3694-08.2009


Greg Hickok said...

So what is the main claim(s)? That vowel sound and speaker identity information is coded in oscillation patterns? And that more abstract representations of this information is accessed later?

I read, and liked, the Science paper by this group showing distinct patterns of activation for vowel versus speaker identification, but the conclusions were more methodologically than theoretically important. Is the same roughly true of the J Neurosci paper?

tom said...

It's a very cool paper, and perhaps I'll need to read it a couple more times before I understand it fully, but my first impression is that these findings perhaps have more to say about the role of phase patterns of alpha oscillations in 'encoding for working memory' or 'maintenance' of an abstraction over time rather than in the initial coding of abstract stimulus properties. The interesting task/stimulus interactions reported in the paper are starting late (~300ms), and although later time windows are not analysed in depth, it looks from the supplementary info that there might be something interesting happening right up to about 900ms or so. At earlier periods (i.e. up to about 250ms) there doesn't seem to be any significant task effects - not surprising if we accept the evidence from the MMN/F studies cited by David that suggests the initial abstractions take place sometime before 250ms and are probably obligatory (although possibly modulated to some degree by the presence/absence of stimulus directed attention).