On the Dangers of Information Theory and the Brain

7 minute read

A couple of years ago, Jieyu Zheng and Markus Meister claimed that our brains are throttled to outputs at a 10 bits per second, which is a provocatively low number given the immense input sensory bandwidth to the brain and the brain’s large size.* The paper got some pushback, which I encourage people to look at, but I want to go in a slightly different direction.

*_As a general rule, I hate these types of results. To me, neuroscience information theory results often vacillate between trivial, nonsensical, and intentionally deceptive. There is a subtlety to using information theory in neuroscience, which the authors know (and it is in their paper), but it is not in the title or news releases, and I don’t think that helps things; but that aside… _

I was thinking about this result the other day while sitting in the back of a large auditorium looking down on the heads of a few hundred people. I was amusingly pondering if their simple brains were only operating at 10 bits/s, then through the Pidgeonhole principle and Birthday paradox; many of those peoples’ brains were supposedly doing the exact same thing at that instant. Which obviously isn’t true, but it is amusing to think about.

Looking out over those hundreds of 10 b/sec drones, I had an epiphany (though maybe someone else in there had the exact same epiphany as me at that instant, hmm…). While I was critiquing the Meister claim in my head, are we not all guilty of doing the exact same thing to neurons?

When we look at neurons’ spike trains, we see a sequence of discrete spikes over time. This is literally communication over a channel (what information theory actually describes), and the temptation to ascribe a number to that channel or each spike is too tempting for engineers and physicists (particularly in the digital age where everything from calculators to TV screens is measured in bits). But spikes being communicated off a neuron are not the same thing as information propogating through an engineered information channel.

Individual neurons are incredibly complex. Like people, the information content of neurons’ inputs are orders of magnitude larger than their output channels. And like people, the internal processing within neurons dwarfs the deceptively simple output of 1’s at discrete times traveling down axons.

This is where I think both modern AI and neuroscience kind of break down. They assume the neurons’ spike trains are a lossy representation of their inputs; a slice through input space in a computational-learning theory formulation, and thus everything collapses into a simple ANN. By ignoring all of the richness of the neuron before the spike, have we doomed ourselves to both inefficient algorithms for AI and poor models of neural computation in the brain? All of this complexity in dendritic arbors and internal subthreshold dynamics is just something that can be reduced to a simple low bandwidth channel? (Incidentally, and perhaps the subject of a later post, I find this recent analysis of Christopher Lynn similarly limited; why try to interpret compressed neuron representations like this in the first place?)

Ultimately, I think the connection between information theory and neuroscience breaks down for a fundamental reason. Claude Shannon developed information theory with the goal of analyzing engineered systems – how do we quantify the capacity of a communication channel that we intend to use to move data over? This has had tremendous implications for technology development over the past century, and the temptation to apply it as an analysis technique has proven far too great for many scientific fields, especially neuroscience.

When I was in grad school, it was socially required reading amongst the students in UC San Diego comp neuro to read Rieke and Bialek’s book ‘Spikes’ (I just found a link on Amazon; it looks really expensive now… I’m not recommending you spend that much on it). Apparently I bought it in 2006, I read it in a few weeks, and somewhere in the last 20 years lost it. But just going by memory, the book had exposed a fundamental conflict in theoretical neuroscience to me – with amazing elegance the authors’ explained from the ‘first principles’ of information theory how the behavior of the H1 neuron of the fruit fly can be explained. The story was so clean mathematically and such a nice fit to the neuroscience that it absolutely must be the path towards understanding the brain, right? But at the same time, sitting in an experimental laboratory at real data while modeling large dentate gyrus circuits, I also could see clearly that the approach couldn’t possibly work anywhere more complicated. Which, in a mammalian brain, is basically everywhere.

The problem is this: we did not engineer biology, so we do not know a priori what the objective functions are of anything. So in interpreting biology, we have to assume some function and map our analysis onto that. This assumption is a form of abstraction; we choose to ignore anything not related to that function and call it noise. No matter how we draw the box, the act of drawing the box imposes a mismatch between our model and the biological system that has real implications for how we can interpret concepts like information.

And therein lies the rub – that “noise” is not a statistical noise independent of the communication channel in the formal Shannon sense; it is simply a mismatch between the real system and the abstraction we chose. We, of course, can choose to model that mismatch as a statistical noise source; but that effectively just makes our model more complex in a way that we know is wrong. It is, in effect, a house of cards waiting to fall down upon us, all because we assumed some function to begin with.

One thing I have come to appreciate in my time outside of neuroscience is how good we are at engineering systems, especially computing applications. We need look no further than how information theory can be applied to digital compression. Consider a three minute song recorded in analog. On a 45-rpm vinyl single, that song physically takes up several square inches of space. That same song, using a lossy digital compression technique like MP3 can be encoded in several megabytes of memory, which today is less than a square millimeter of transistors in RAM. Engineering is Science / $; and space ~ money, and so information theory and compression has direct monetary value.

Information-guided miniaturization and compression have been tremendous for society, engineering, and dollars; so, it is incredibly tempting to start to assume the same thing about biology. Indeed, we even see this in discussions about memory consolidation – our long-term memories are a lossy version of our experiences, stored in high resolution in the hippocampus, only to be consolidated to a compressed form in the cortex. Maybe slow waves in REM sleep helped guide a Fourier transform on our short-term memories in the hippocampus, allowing a lossy selection of only the key frequencies that mattered and stored that in the cortex? Ooh… grid cells are periodic, that can be basis for that frequency map! And so on. That sounds so elegant and nice. It is what we did for JPG and MP3! It just makes sense, doesn’t it?

While this is incredibly tempting, it overlooks many things that, quite frankly, make it doomed from the start. Just because it was really cool that an iPod could hold 1000 songs instead of 20 when you were back in grad school, it does not mean that is the same objective function for what the brain is trying to do with its memories. Everything about WAV -> MP3 or TIFF -> JPG compression is well defined; we know what the source lossless data is and where it is located, we know the encoding, we know the compression, and we know the decoding. We know none of that when it comes to memories aside from our intuition. As a model, its only value is that it fits our intuition and self-awareness, which is a dangerous foundation as there is a lot of random data in the brain, most of it sparse and noisy (in the statistical sense), and so finding supporting evidence for a hard-to-falsify framework is not difficult.

All of this comes back to the 10 bits per human thing. If you hand someone a thermometer, they’re going to measure temperature, even if that is not the right thing to measure. Our exploration into neuroscience has, for as long as I can remember, been tools driven to a fault. This has led to an extreme version of the “search for keys under the lamppost” effect that has stifled the field. We have to stop trying to use the wrong tools.

Share on

X Facebook LinkedIn Bluesky

Brad Aimone

On the Dangers of Information Theory and the Brain

Share on

You May Also Enjoy

The Foundations of a Framework for Neuromorphic Computing: Part 2

The Foundations of a Framework for Neuromorphic Computing: Part 1