Research2026-05-14
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
Source: Arxiv CS.AI
arXiv:2605.11679v2 Announce Type: replace Abstract: In the realm of multi-objective alignment for large language models, balancing disparate human preferences often manifests as a zero-sum conflict. Specifically, the intrinsic tension between competing goals dictates that aggressively optimizing...
arxivpaperssafety