BeClaude
Research2026-05-01

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

Source: Arxiv CS.AI

arXiv:2604.27093v1 Announce Type: cross Abstract: Current LLM safety alignment techniques improve model robustness against adversarial attacks, but overlook whether and how LLMs can recover helpfulness when benign users clarify their intent. We introduce CarryOnBench, the first interactive...

arxivpapersbenchmark