Research2026-04-20
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
Source: Arxiv CS.AI
arXiv:2604.15409v1 Announce Type: cross Abstract: KV caching is a ubiquitous optimization in autoregressive transformer inference, long presumed to be numerically equivalent to cache-free computation. This assumption fails under standard FP16 precision: cache-ON and cache-OFF execution paths employ...
arxivpapers