A foundation model of vision, audition, and language for in-silico neuroscience | Research - AI at Meta
A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience
===========================================================
Introduction
Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, preventing a unified model of cognition in the human brain. This paper introduces TRIBE v2, a tri-modal (video, audio, and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions.
Key Findings
- TRIBE v2 accurately predicts high-resolution brain responses for novel stimuli, tasks, and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy.
- The model enables in-silico experimentation, tested on seminal visual and neuro-linguistic paradigms, recovering a variety of results established by decades of empirical research.
- By extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration.
Methodology
- TRIBE v2 is trained on a unified dataset of over 1,000 hours of fMRI across 720 subjects.
- The model is evaluated on a variety of naturalistic and experimental conditions, including novel stimuli, tasks, and subjects.
Implications
- TRIBE v2 establishes artificial intelligence as a unifying framework for exploring the functional organization of the human brain.
- The model has the potential to revolutionize our understanding of cognitive neuroscience and inform the development of more effective treatments for neurological and psychiatric disorders.
Related Work
- Unified Vision–Language Modeling via Concept Space Alignment (Qiu et al., 2026)
- Disentangling the Factors of Convergence between Brains and Computer Vision Models (Raugel et al., 2025)
- Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset (Agrawal et al., 2025)
- Emergence of Language in the Developing Brain (Evanson et al., 2025)
Conclusion
TRIBE v2 represents a significant advancement in the field of cognitive neuroscience, providing a unified framework for exploring the functional organization of the human brain. The model's ability to accurately predict brain activity and enable in-silico experimentation has the potential to revolutionize our understanding of the brain and inform the development of more effective treatments for neurological and psychiatric disorders.
Submitted by pete.nelson-y28clt1a