Research2026-05-05

Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner

arXiv:2604.18239v3 Announce Type: replace-cross Abstract: Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based methods also suppress the chosen response when they try to suppress the rejected one, and there is no general way...

Read Original Article on Arxiv CS.AI

arxivpapers