Research2026-04-20

The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference

arXiv:2604.15409v1 Announce Type: cross Abstract: KV caching is a ubiquitous optimization in autoregressive transformer inference, long presumed to be numerically equivalent to cache-free computation. This assumption fails under standard FP16 precision: cache-ON and cache-OFF execution paths employ...

Read Original Article on Arxiv CS.AI

arxivpapers