Research2026-05-07
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization
Source: Arxiv CS.AI
arXiv:2605.03562v1 Announce Type: cross Abstract: KV-cache quantizers usually optimize storage-space reconstruction, even though attention reads keys through logits and values through attention-weighted readout. We argue that persistent cache error should be measured in model-visible coordinates....
arxivpapers