Research2026-05-08
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
Source: Arxiv CS.AI
arXiv:2605.05696v1 Announce Type: cross Abstract: Agentic LLM workloads put bit-identical tokens at shifted positions every turn, voiding prefix caches at the first byte of divergence. Operators report cache-hit regressions ranging from moderate slowdowns to severe TTFT spikes of 10-16s on...
arxivpapersagents