BeClaude
Research2026-04-28

One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

Source: Arxiv CS.AI

arXiv:2604.13006v2 Announce Type: replace-cross Abstract: Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness under trivial constraints? We show that simple lexical constraints (banning a single punctuation character or common word)...

arxivpapersrag