Research2026-04-22

DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

arXiv:2604.19544v1 Announce Type: new Abstract: Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key...

Read Original Article on Arxiv CS.AI

arxivpapersmultimodal