Research2026-04-22
DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling
Source: Arxiv CS.AI
arXiv:2604.19544v1 Announce Type: new Abstract: Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key...
arxivpapersmultimodal