Research2026-05-07
Architectural Observability Collapse in Transformers
Source: Arxiv CS.AI
arXiv:2604.24801v2 Announce Type: replace-cross Abstract: Activation monitoring can catch confident errors in autoregressive transformers only if training preserved an internal decision-quality signal that output confidence does not expose. Monitorability is an architectural property before it is a...
arxivpapers