Predict what a mouse sees by decoding brain signals

A research team from EPFL has developed a novel machine-learning algorithm that can reveal the hidden structure in data recorded from the brain, predicting complex information such as what mice see.
© 2023 EPFL / Ella Marushchenko. The brain's neural activity is largely hidden in complex, nonlinear systems, much like how we can only see the surface of icebergs.

Is it possible to reconstruct what someone sees based on brain signals alone? The answer is no, not yet. But EPFL researchers have made a step in that direction by introducing a new algorithm for building artificial neural network models that capture brain dynamics with an impressive degree of accuracy.

Rooted in mathematics, the novel machine learning algorithm is called CEBRA (pronounced zebra), and learns the hidden structure in the neural code.

What information the CEBRA learns from the raw neural data can be tested after training by decoding – a method that is used for brain-machine-interfaces (BMIs) – and they’ve shown they can decode from the model what a mouse sees while it watches a movie. But CEBRA is not limited to visual cortex neurons, or even brain data. Their study also shows it can be used to predict the movements of the arm in primates, and to reconstruct the positions of rats as they freely run around an arena. The study is published in Nature.

“This work is just one step towards the theoretically-backed algorithms that are needed in neurotechnology to enable high-performance BMIs,” says Mackenzie Mathis, EPFL’s Bertarelli Chair of Integrative Neuroscience and PI of the study.

«This algorithm is not limited to neuroscience research, as it can be applied to many datasets involving time or joint information, including animal behavior and gene-expression data.»      Mackenzie Mathis, PI of the study

For learning the latent (i.e., hidden) structure in the visual system of mice, CEBRA can predict unseen movie frames directly from brain signals alone after an initial training period mapping brain signals and movie features.

The data used for the video decoding was open-access through the Allen Institute in Seattle, WA. The brain signals are obtained either directly by measuring brain activity via electrode probes inserted into the visual cortex area of the mouse’s brain, or using optical probes which consist of using genetically modified mice, engineered so that activated neurons glow green. During the training period, CEBRA learns to map the brain activity to specific frames. CEBRA performs well with less than 1% of neurons in the visual cortex, considering that, in mice, this brain area consists of roughly 0.5 million neurons.

“Concretely, CEBRA is based on contrastive learning, a technique that learns how high-dimensional data can be arranged, or embedded, in a lower-dimensional space called a latent space, so that similar data points are close together and more-different data points are further apart,” explains Mathis. “This embedding can be used to infer hidden relationships and structure in the data. It enables researchers to jointly consider neural data and behavioral labels, including measured movements, abstract labels like “reward,” or sensory features such as colors or textures of images.”

“CEBRA excels compared to other algorithms at reconstructing synthetic data, which is critical to compare algorithms,” says Steffen Schneider, the co-first author of the paper. “Its strengths also lie in its ability to combine data across modalities, such as movie features and brain data, and it helps limit nuances, such as changes to the data that depend on how they were collected.”

“The goal of CEBRA is to uncover structure in complex systems. And, given the brain is the most complex structure in our universe, it’s the ultimate test space for CEBRA. It can also give us insight into how the brain processes information and could be a platform for discovering new principles in neuroscience by combining data across animals, and even species.” says Mathis. “This algorithm is not limited to neuroscience research, as it can be applied to many datasets involving time or joint information, including animal behavior and gene-expression data. Thus, the potential clinical applications are exciting.”