Wednesday, August 2, 2017

Big data or big theory? Gallant and Hickok discuss.

A few months back, Jack Gallant Tweeted some comments that caught my attention. Here is one:

The problem is that most MRI exp. design is based on behav. psych, which is a poor framework to begin with.

I disagreed on grounds that psychology provides at least one part of a theory that constrains MRI research.

Here is another of Jack's Tweets:

In theory, theory is great. In practice currently, only data-driven models accurately predict brain activity under naturalistic conditions.

To which, I responded:


What's your endgame, Jack? Is predicting brain activity all you want to do? Or do u want to understand how brain computes mind?
Jack's answer:
Science's end game is always an elegant, predictive theory. But complex systems often require a data-driven middle game.
The discussion between me and Jack then went offline.  After a few exchanges it seemed to me we were converging a little bit and certainly clarifying our positions.  I felt the discussion would be interesting to others and asked Jack if I could post it here.  He agreed and so here you go!  Comments welcome!

JG:

I look at it this way: which would you rather have, (1) a computational model that predicts well but you don't know why, or (2) a model that you understand but it doesn't predict anything? I would say obviously (1) is preferred. If you have an in silico computational model that predicts but you don't understand it, you can study the model instead of the brain. And of course that will be much easier, because you are not limited in the number of experiments that you can do to the model. In contrast, if you have (2) you could be stuck in an irrelevant local minimum, and be wasting your time completely.

Understanding (i.e., a low-dimensional explanation that accurately predicts) is obviously the ultimate goal of science, but you may not get there in the most straightforward path (i.e. through theory-driven approaches).

GH:

Hi Jack, yes a nice predictive model is great. My point though is about what we are trying to predict. Your statement makes it sound like all we are trying to predict is physiology. That's fine for a physiologist, but the point about studying the brain is that it is a system for controlling behavior. We therefore need good data and good theories of brain, behavior and their relation. I think you agree but many of your tweets give an anti theory anti cog sci impression.

JG:

Hey Greg believe it or not I think that we agree on everything except priorities. So let me summarize where I think the problem lies and you can correct me. We both think that the brain is some sort of meaty computer that controls behavior. And we both think that behavior is, ultimately, the most interesting thing. However, you seem to think that theory is really useful and important AT THIS TIME for studying the brain and the brain-behavior relationship. And I do not. My reasons for this are that (1) our understanding about how a system like the brain might compute are really poor, because we don't really understand distributed nonlinear dynamical systems like the brain, (2) we are severely data-limited because our brain measurement technology is pretty poor, and (3) other attempts to use theory to predict computational principles of brain function beyond the most peripheral stages of sensory processing have largely failed. Just take vision as an example. There are no good theories of visual function beyond V1 and MT. All the models that work well beyond those areas are data-driven, not theoretical. And note that the SAME PROBLEM arises in computer vision, and in NLP. The models that actually WORK in computer vision are neural nets, which are basically a data-driven universal mapping function. And the models that actually WORK in NLP are neural nets. That is why all of the engineering people have (temporarily) abandoned theory in favor of nets. Now ultimately of course we're going to have to take these data driven models and extract their principles of computation. But that is a very different problem from starting with overly-strong theory and then ending up in a local minimum.

 GH:

 Thanks for this. Your view is more clear now. We are talking about slightly different things or at least we are taking different approaches. You seem to be taking an engineering approach with a next-step goal of trying to figure out what predicts neural activity and you hope that once that is done, we can derive some "principles of computation." My concern is that there is no easy way to get from the engineering approach to the principles without doing some serious theoretical work that can inform the data-driven results.  So while I like the engineering approach and think it is worthwhile pursuing, I don't think it is going to be able to answer our questions without doing theoretical work in parallel.

7 comments:

William Matchin said...

I study people who are starting to acquire their first language late, and who are clearly quite different from those that have acquired it from birth. Theory tells me that these people can acquire words but not a grammar. What can the modeling approach possibly do to help me understand what is different about these people? I don't see any use in modeling unless it includes a theory (i.e., claims about ontology).

Unknown said...

Interesting discussion. I have to say that I tend to agree with JG on this one. Traditionally, my field (aphasiology) has suffered from far too little data and very grandiose theoretical claims. This really has gotten us (almost) nowhere in understanding the mechanisms (cognitive or physiological) involved in aphasia and, more importantly, for helping people recover from aphasia. At this time, I do think we need larger datasets, which will ultimately help us propose stronger theories that make predictions about behavior and inform rehabilitation.

- Julius

Greg Hickok said...

I don't think anyone is disputing the usefulness of big data and I agree, past theoretical work on aphasia was under-constrained. Jack's position, though, is that there is little role for theory at this stage and we should focus exclusively on building data-driven models that predict new sets of data, even if we don't understand why it does the predicting. My position is we need both big data and big theory working together in lockstep. That's why you brought me into your C-Star project, right Julius? And look how much progress we are making! ;)

Tal Linzen said...

In practice, no model is perfectly predictive; some kinds of errors show that your model is missing a fundamental property of the phenomenon that it's supposed to model, but other kinds of errors don't. There's no way to identify those kinds without a theory of the task (what is a "fundamental property"). In other words, you need theory to construct the test set you use to evaluate your model.

Unknown said...

OK, I guess I better temper my earlier enthusiasm since I strongly believe some theoretical models are very useful/important, especially for understanding normal function. However, there are clearly areas where we severely lack data. E.g. look at agrammatism in aphasia. There are multiple theories that have attempted to explain this impairment, many without much (any?) data support. This is a good example where we probably need lots more data to formulate an informed theory. - Julius

Niels Janssen said...

Thanks this is an interesting debate. However, I am not sure that this is entirely satisfactory. It seems to me that both JG and GH agree that there should be a theory of behavior, but (1) how to get to that theory, and (2) what the character of that theory should be is left unclear in my opinion. So, with respect to the first point, I think I read from this that JG endorses a more bottom up approach (generate lots of brain data, let those data drive theory development), while GH endorses a more top down approach (generate hypotheses, try to confirm those with targeted data). Is that a correct interpretation?

With respect to the second point, it seems that JG likes to see a more biological theory that explains behavior, while GH likes to see a psychology/cognitive theory in between brain and behavior. Is that a correct interpretation?

A final point is that all of this bottom-ip top-down discussion strongly reminds me of the connectionism debate from back in the day.

Greg Hickok said...

Julius -- Yes, agrammatism is the poster child for more theory than data. Let's fix that!

Niels -- I think your interpretation is roughly correct and you raise an important point about what "theory" means here. I believe we need three kinds of theories. One about how the brain works (i.e., how it computes stuff), one about the mind works (i.e., the tasks it needs to carry out and how information is represented and is transformed to accomplish them), and one about the relation between the first two (i.e., how the brain actually computes the mind's processes). For the latter, I think that an important key will be in the neural architectures. True, the debate is very similar the old connectionism debate. There is no right way, in my view; we need all the ways to make real progress, which is why I bristle hearing statements to the effect that we need to focus on bottom up big data and hope a theory emerges from it.