Research2026-05-08

Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving

arXiv:2605.05696v1 Announce Type: cross Abstract: Agentic LLM workloads put bit-identical tokens at shifted positions every turn, voiding prefix caches at the first byte of divergence. Operators report cache-hit regressions ranging from moderate slowdowns to severe TTFT spikes of 10-16s on...

Read Original Article on Arxiv CS.AI

arxivpapersagents