Research2026-05-11

Reformulating KV Cache Eviction Problem for Long-Context LLM Inference

arXiv:2605.07234v1 Announce Type: cross Abstract: Large language models (LLMs) support long-context inference but suffer from substantial memory and runtime overhead due to Key-Value (KV) Cache growth. Existing KV Cache eviction methods primarily rely on local attention weights, neglecting the...

Read Original Article on Arxiv CS.AI

arxivpapers