Research2026-05-08

Same Words, Different Judgments: How Preferences Vary Across Modalities

arXiv:2602.22710v2 Announce Type: replace-cross Abstract: Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences. However, evaluation protocols for such data were designed for text and have not been validated for speech. We present the...

Read Original Article on Arxiv CS.AI

arxivpapers