Research2026-05-12
KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving
Source: Arxiv CS.AI
arXiv:2605.09735v1 Announce Type: cross Abstract: Static-graph LLM decoders provide predictable launches, fixed tensor shapes, and low submission overhead, but online decoding exposes highly irregular KV-cache behavior: request lengths differ, EOS events arrive asynchronously, and logical histories...
arxivpapers