Towards Interpretable Visual Decoding with Attention to Brain Representations

Feng, Pinyuan; Adeli, Hossein; Guo, Wenxuan; Cheng, Fan; Hwang, Ethan; Kriegeskorte, Nikolas

Towards Interpretable Visual Decoding with Attention to Brain Representations

Pinyuan Feng, Hossein Adeli, Wenxuan Guo, Fan Cheng, Ethan Hwang, Nikolaus Kriegeskorte

Visual Inference Lab
Zuckerman Mind Brain Behavior Institute

Columbia University in the City of New York

arXiv Code to be released soon

News:

09/30/2025: Our long version paper is now available on arXiv!
09/22/2025: Our short version paper has been accepted at Foundation Models for the Brain and Body Workshop @ NeurIPS 2025 !
04/21/2025: Our extended abstract has been accepted at CCN 2025 !

TL;DR: Typical two-stage decoding pipelines first map brain activity to intermediate feature spaces (e.g., CLIP/DINO) and then use those embeddings to guide a generative model. Our end-to-end brain-to-image approach conditions a latent diffusion model directly on brain activity, enabling interpretations of the generative dynamics in both image and brain spaces.

Abstract

Recent work has demonstrated that complex visual stimuli can be decoded from human brain activity using deep generative models, helping brain science researchers interpret how the brain represents real-world scenes. However, most current approaches leverage mapping brain signals into intermediate image or text feature spaces before guiding the generative process, masking the effect of contributions from different brain areas on the final reconstruction output. In this work, we propose NeuroAdapter, a visual decoding framework that directly conditions a latent diffusion model on brain representations, bypassing the need for intermediate feature spaces. Our method demonstrates competitive visual reconstruction quality on public fMRI datasets compared to prior work, while providing greater transparency into how brain signals shape the generation process. To this end, we contribute an Image-Brain BI-directional interpretability framework (IBBI ) which investigates cross-attention mechanisms across diffusion denoising steps to reveal how different cortical areas influence the unfolding generative trajectory. Our results highlight the potential of end-to-end brain-to-image decoding and establish a path toward interpreting diffusion models through the lens of visual neuroscience.

Decoded Examples from Different NSD Subjects

50 stimulus-prediction pairs are randomly selected from the test set of NSD Subject 1.

50 stimulus-prediction pairs are randomly selected from the test set of NSD Subject 2.

50 stimulus-prediction pairs are randomly selected from the test set of NSD Subject 5.

50 stimulus-prediction pairs are randomly selected from the test set of NSD Subject 7.

IBBI Interpretbility Probing

Face ROI

Body ROI

Scene ROI

Word ROI

BibTeX

          
          @article{feng2025neuroadapter,
            doi = {10.48550/ARXIV.2509.23566},
            url = {https://arxiv.org/abs/2509.23566},
            author = {Feng,  Pinyuan and Adeli,  Hossein and Guo,  Wenxuan and Cheng,  Fan and Hwang,  Ethan and Kriegeskorte,  Nikolaus},
            title = {Towards Interpretable Visual Decoding with Attention to Brain Representations},
            publisher = {arXiv},
            year = {2025},
            copyright = {arXiv.org perpetual,  non-exclusive license}
          }

Related Papers from Our Lab

Transformer brain encoders explain human high-level visual responses

In Silico Mapping of Visual Categorical Selectivity Across the Whole Brain

Towards Interpretable Visual Decoding with Attention to Brain Representations

Abstract

Decoded Examples from Different NSD Subjects

50 stimulus-prediction pairs are randomly selected from the test set of NSD Subject 1.

50 stimulus-prediction pairs are randomly selected from the test set of NSD Subject 2.

50 stimulus-prediction pairs are randomly selected from the test set of NSD Subject 5.

50 stimulus-prediction pairs are randomly selected from the test set of NSD Subject 7.

IBBI Interpretbility Probing

Face ROI

Body ROI

Scene ROI

Word ROI

BibTeX