Towards Interpretable Visual Decoding with Attention to Brain Representations

VI icon Visual Inference Lab
Zuckerman Mind Brain Behavior Institute
CU icon Columbia University in the City of New York
UP icon News:
teaser

TL;DR: Typical two-stage decoding pipelines first map brain activity to intermediate feature spaces (e.g., CLIP/DINO) and then use those embeddings to guide a generative model. Our end-to-end brain-to-image approach conditions a latent diffusion model directly on brain activity, enabling interpretations of the generative dynamics in both image and brain spaces.

Abstract

Recent work has demonstrated that complex visual stimuli can be decoded from human brain activity using deep generative models, helping brain science researchers interpret how the brain represents real-world scenes. However, most current approaches leverage mapping brain signals into intermediate image or text feature spaces before guiding the generative process, masking the effect of contributions from different brain areas on the final reconstruction output. In this work, we propose NeuroAdapter, a visual decoding framework that directly conditions a latent diffusion model on brain representations, bypassing the need for intermediate feature spaces. Our method demonstrates competitive visual reconstruction quality on public fMRI datasets compared to prior work, while providing greater transparency into how brain signals shape the generation process. To this end, we contribute an Image-Brain BI-directional interpretability framework (IBBI ) which investigates cross-attention mechanisms across diffusion denoising steps to reveal how different cortical areas influence the unfolding generative trajectory. Our results highlight the potential of end-to-end brain-to-image decoding and establish a path toward interpreting diffusion models through the lens of visual neuroscience.

Decoded Examples from Different NSD Subjects

IBBI Interpretbility Probing

BibTeX

          
          @article{feng2025neuroadapter,
            doi = {10.48550/ARXIV.2509.23566},
            url = {https://arxiv.org/abs/2509.23566},
            author = {Feng,  Pinyuan and Adeli,  Hossein and Guo,  Wenxuan and Cheng,  Fan and Hwang,  Ethan and Kriegeskorte,  Nikolaus},
            title = {Towards Interpretable Visual Decoding with Attention to Brain Representations},
            publisher = {arXiv},
            year = {2025},
            copyright = {arXiv.org perpetual,  non-exclusive license}
          }