Research2026-05-07
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live
Source: Arxiv CS.AI
arXiv:2511.02230v4 Announce Type: replace-cross Abstract: KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting. This policy breaks for agentic workloads, which interleave LLM...
arxivpapersagents