Partnership2026-05-12
RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache
Source: Arxiv CS.AI
arXiv:2605.08317v1 Announce Type: cross Abstract: Large language models (LLMs) have shown strong performance across diverse tasks, but their inference with long input contexts is bottlenecked by memory size and bandwidth. The Key-Value (KV) cache size grows linearly with sequence length and needs...
arxivpapers