Research2026-04-28

Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

arXiv:2604.18239v2 Announce Type: replace-cross Abstract: Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based objectives suppress the chosen response along with the rejected one, a phenomenon known as likelihood...

Read Original Article on Arxiv CS.AI

arxivpapers