Research2026-04-28
One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness
Source: Arxiv CS.AI
arXiv:2604.13006v2 Announce Type: replace-cross Abstract: Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness under trivial constraints? We show that simple lexical constraints (banning a single punctuation character or common word)...
arxivpapersrag