A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience

===========================================================

Introduction

Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, preventing a unified model of cognition in the human brain. This paper introduces TRIBE v2, a tri-modal (video, audio, and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions.

Key Findings

TRIBE v2 accurately predicts high-resolution brain responses for novel stimuli, tasks, and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy.
The model enables in-silico experimentation, tested on seminal visual and neuro-linguistic paradigms, recovering a variety of results established by decades of empirical research.
By extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration.

Methodology

TRIBE v2 is trained on a unified dataset of over 1,000 hours of fMRI across 720 subjects.
The model is evaluated on a variety of naturalistic and experimental conditions, including novel stimuli, tasks, and subjects.

Implications

TRIBE v2 establishes artificial intelligence as a unifying framework for exploring the functional organization of the human brain.
The model has the potential to revolutionize our understanding of cognitive neuroscience and inform the development of more effective treatments for neurological and psychiatric disorders.

Related Work

Unified Vision–Language Modeling via Concept Space Alignment (Qiu et al., 2026)
Disentangling the Factors of Convergence between Brains and Computer Vision Models (Raugel et al., 2025)
Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset (Agrawal et al., 2025)
Emergence of Language in the Developing Brain (Evanson et al., 2025)

Conclusion

TRIBE v2 represents a significant advancement in the field of cognitive neuroscience, providing a unified framework for exploring the functional organization of the human brain. The model's ability to accurately predict brain activity and enable in-silico experimentation has the potential to revolutionize our understanding of the brain and inform the development of more effective treatments for neurological and psychiatric disorders.

quick.as