BeClaude
Research2026-05-12

When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression

Source: Arxiv CS.AI

arXiv:2605.08234v1 Announce Type: cross Abstract: Long-context LLM inference is bottlenecked by the memory and bandwidth cost of reading large KV caches during decoding. KV compression reduces this cost by keeping only part of the cache, but task accuracy alone does not identify why a selector...

arxivpapers