Research2026-05-08

Causal Probing for Internal Visual Representations in Multimodal Large Language Models

arXiv:2605.05593v1 Announce Type: new Abstract: Despite the remarkable success of Multimodal Large Language Models (MLLMs) across diverse tasks, the internal mechanisms governing how they encode and ground distinct visual concepts remain poorly understood. To bridge this gap, we propose a causal...

Read Original Article on Arxiv CS.AI

arxivpapersmultimodal