Friday, March 27, 2026
Mobile Offer

🎁 You've Got 1 Reward Left

Check if your device is eligible for instant bonuses.

Unlock Now
Survey Cash

🧠 Discover the Simple Money Trick

This quick task could pay you today — no joke.

See It Now
Top Deals

📦 Top Freebies Available Near You

Get hot mobile rewards now. Limited time offers.

Get Started
Game Offer

🎮 Unlock Premium Game Packs

Boost your favorite game with hidden bonuses.

Claim Now
Money Offers

💸 Earn Instantly With This Task

No fees, no waiting — your earnings could be 1 click away.

Start Earning
Crypto Airdrop

🚀 Claim Free Crypto in Seconds

Register & grab real tokens now. Zero investment needed.

Get Tokens
Food Offers

🍔 Get Free Food Coupons

Claim your free fast food deals instantly.

Grab Coupons
VIP Offers

🎉 Join Our VIP Club

Access secret deals and daily giveaways.

Join Now
Mystery Offer

🎁 Mystery Gift Waiting for You

Click to reveal your surprise prize now!

Reveal Gift
App Bonus

📱 Download & Get Bonus

New apps giving out free rewards daily.

Download Now
Exclusive Deals

💎 Exclusive Offers Just for You

Unlock hidden discounts and perks.

Unlock Deals
Movie Offer

🎬 Watch Paid Movies Free

Stream your favorite flicks with no cost.

Watch Now
Prize Offer

🏆 Enter to Win Big Prizes

Join contests and win amazing rewards.

Enter Now
Life Hack

💡 Simple Life Hack to Save Cash

Try this now and watch your savings grow.

Learn More
Top Apps

📲 Top Apps Giving Gifts

Download & get rewards instantly.

Get Gifts
Summer Drinks

🍹 Summer Cocktails Recipes

Make refreshing drinks at home easily.

Get Recipes

Latest Posts

Meta Releases TRIBE v2: A Brain Encoding Model That Predicts fMRI Responses Across Video, Audio, and Text Stimuli


Neuroscience has long been a field of divide and conquer. Researchers typically map specific cognitive functions to isolated brain regions—like motion to area V5 or faces to the fusiform gyrus—using models tailored to narrow experimental paradigms. While this has provided deep insights, the resulting landscape is fragmented, lacking a unified framework to explain how the human brain integrates multisensory information.

Meta’s FAIR team has introduced TRIBE v2, a tri-modal foundation model designed to bridge this gap. By aligning the latent representations of state-of-the-art AI architectures with human brain activity, TRIBE v2 predicts high-resolution fMRI responses across diverse naturalistic and experimental conditions.

https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/

The Architecture: Multi-modal Integration

TRIBE v2 does not learn to ‘see’ or ‘hear’ from scratch. Instead, it leverages the representational alignment between deep neural networks and the primate brain. The architecture consists of three frozen foundation models serving as feature extractors, a temporal transformer, and a subject-specific prediction block.

The model processes stimuli through three specialized encoders:

  • Text: Contextualized embeddings are extracted from LLaMA 3.2-3B. For every word, the model prepends the preceding 1,024 words to provide temporal context, which is then mapped to a 2 Hz grid.
  • Video: The model uses V-JEPA2-Giant to process 64-frame segments spanning the preceding 4 seconds for each time-bin.
  • Audio: Sound is processed through Wav2Vec-BERT 2.0, with representations resampled to 2 Hz to match the stimulus frequency (fstim) (f_{stim}).

2. Temporal Aggregation

The resulting embeddings are compressed into a shared dimension (D=384)(D=384) and concatenated to form a multi-modal time series with a model dimension of Dmodel=3×384=1152D_{model} = 3 \times 384 = 1152. This sequence is fed into a Transformer encoder (8 layers, 8 attention heads) that exchanges information across a 100-second window.

3. Subject-Specific Prediction

To predict brain activity, the Transformer outputs are decimated to the 1 Hz fMRI frequency (ffMRI)(f_{fMRI}) and passed through a Subject Block. This block projects the latent representations to 20,484 cortical vertices (fsaverage5surface)(fsaverage5 surface) and 8,802 subcortical voxels.

Data and Scaling Laws

A significant hurdle in brain encoding is data scarcity. TRIBE v2 addresses this by utilizing ‘deep’ datasets for training—where a few subjects are recorded for many hours—and ‘wide’ datasets for evaluation.

  • Training: The model was trained on 451.6 hours of fMRI data from 25 subjects across four naturalistic studies (movies, podcasts, and silent videos).
  • Evaluation: It was evaluated across a broader collection totaling 1,117.7 hours from 720 subjects.

The research team observed a log-linear increase in encoding accuracy as the training data volume increased, with no evidence of a plateau. This suggests that as neuroimaging repositories expand, the predictive power of models like TRIBE v2 will continue to scale.

Results: Beating the Baselines

TRIBE v2 significantly outperforms traditional Finite Impulse Response (FIR) models, the long-standing gold standard for voxel-wise encoding.

Zero-Shot and Group Performance

One of the model’s most striking capabilities is zero-shot generalization to new subjects. Using an ‘unseen subject’ layer, TRIBE v2 can predict the group-averaged response of a new cohort more accurately than the actual recording of many individual subjects within that cohort. In the high-resolution Human Connectome Project (HCP) 7T dataset, TRIBE v2 achieved a group correlation (Rgroup) (R_{group}) near 0.4, a two-fold improvement over the median subject’s group-predictivity.

Fine-Tuning

When given a small amount of data (at most one hour) for a new participant, fine-tuning TRIBE v2 for just one epoch leads to a two- to four-fold improvement over linear models trained from scratch.

In-Silico Experimentation

The research team argue that TRIBE v2 could be useful for piloting or pre-screening neuroimaging studies. By running virtual experiments on the Individual Brain Charting (IBC) dataset, the model recovered classic functional landmarks:

  • Vision: It accurately localized the fusiform face area (FFA) and parahippocampal place area (PPA).
  • Language: It successfully recovered the temporo-parietal junction (TPJ) for emotional processing and Broca’s area for syntax.

Furthermore, applying Independent Component Analysis (ICA) to the model’s final layer revealed that TRIBE v2 naturally learns five well-known functional networks: primary auditory, language, motion, default mode, and visual.

https://aidemos.atmeta.com/tribev2/

Key Takeaway

  • A Powerhouse Tri-modal Architecture: TRIBE v2 is a foundation model that integrates video, audio, and language by leveraging state-of-the-art encoders like LLaMA 3.2 for text, V-JEPA2 for video, and Wav2Vec-BERT for audio.
  • Log-Linear Scaling Laws: Much like the Large Language Models we use every day, TRIBE v2 follows a log-linear scaling law; its ability to accurately predict brain activity increases steadily as it is fed more fMRI data, with no performance plateau currently in sight.
  • Superior Zero-Shot Generalization: The model can predict the brain responses of unseen subjects in new experimental conditions without any additional training. Remarkably, its zero-shot predictions are often more accurate at estimating group-averaged brain responses than the recordings of individual human subjects themselves.
  • The Dawn of In-Silico Neuroscience: TRIBE v2 enables ‘in-silico’ experimentation, allowing researchers to run virtual neuroscientific tests on a computer. It successfully replicated decades of empirical research by identifying specialized areas like the fusiform face area (FFA) and Broca’s area purely through digital simulation.
  • Emergent Biological Interpretability: Even though it’s a deep learning ‘black box,’ the model’s internal representations naturally organized themselves into five well-known functional networks: primary auditory, language, motion, default mode, and visual.

Check out the Code, Weights and Demo. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.



Source link

Latest Posts

Don't Miss

Stay in touch

To be updated with all the latest news, offers and special announcements.