Research2026-05-01
Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations
Source: Arxiv CS.AI
arXiv:2604.27093v1 Announce Type: cross Abstract: Current LLM safety alignment techniques improve model robustness against adversarial attacks, but overlook whether and how LLMs can recover helpfulness when benign users clarify their intent. We introduce CarryOnBench, the first interactive...
arxivpapersbenchmark