BeClaude
Research2026-05-05

Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner

Source: Arxiv CS.AI

arXiv:2604.18239v3 Announce Type: replace-cross Abstract: Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based methods also suppress the chosen response when they try to suppress the rejected one, and there is no general way...

arxivpapers