Research2026-05-06
SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving
Source: Arxiv CS.AI
arXiv:2605.01708v1 Announce Type: cross Abstract: Contemporary systems serving large language models (LLMs) have adopted prefill-decode disaggregation to better load-balance between the compute-bound prefill phase and the memory-bound decode phase. Under this design, prefill workers generate a KV...
arxivpapers