Research2026-05-06

SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving

arXiv:2605.01708v1 Announce Type: cross Abstract: Contemporary systems serving large language models (LLMs) have adopted prefill-decode disaggregation to better load-balance between the compute-bound prefill phase and the memory-bound decode phase. Under this design, prefill workers generate a KV...

Read Original Article on Arxiv CS.AI

arxivpapers