Research2026-04-20

OjaKV: Context-Aware Online Low-Rank KV Cache Compression

arXiv:2509.21623v2 Announce Type: replace-cross Abstract: The expanding long-context capabilities of large language models are constrained by a significant memory bottleneck: the key-value (KV) cache required for autoregressive generation. This bottleneck is substantial; for instance, a...

Read Original Article on Arxiv CS.AI

arxivpapers