Research2026-04-30
Option-Order Randomisation Reveals a Distributional Position Attractor in Prompted Sandbagging
Source: Arxiv CS.AI
arXiv:2604.26206v1 Announce Type: cross Abstract: A predecessor pilot (Cacioli, 2026) found that Llama-3-8B implements prompted sandbagging as positional collapse rather than answer avoidance. However, fixed option ordering in MMLU-Pro left open whether this reflected a model-level...
arxivpapersprompting