Research2026-05-12

KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

arXiv:2605.09735v1 Announce Type: cross Abstract: Static-graph LLM decoders provide predictable launches, fixed tensor shapes, and low submission overhead, but online decoding exposes highly irregular KV-cache behavior: request lengths differ, EOS events arrive asynchronously, and logical histories...

Read Original Article on Arxiv CS.AI

arxivpapers