BeClaude
Research2026-05-14

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

Source: Arxiv CS.AI

arXiv:2605.13734v1 Announce Type: cross Abstract: LLMs are widely adopted in production, pushing inference systems to their limits. Disaggregated LLM serving (e.g., PD separation and KV state disaggregation) improves scalability and cost efficiency, but it also turns KV into an explicit payload...

arxivpapers