Policy2026-04-23
V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization
Source: Arxiv CS.AI
arXiv:2604.20755v1 Announce Type: new Abstract: We introduce V-tableR1, a process-supervised reinforcement learning framework that elicits rigorous, verifiable reasoning from multimodal large language models (MLLMs). Current MLLMs trained solely on final outcomes often treat visual reasoning as a...
arxivpapersreasoningmultimodal