BeClaude
Partnership2026-05-12

RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache

Source: Arxiv CS.AI

arXiv:2605.08317v1 Announce Type: cross Abstract: Large language models (LLMs) have shown strong performance across diverse tasks, but their inference with long input contexts is bottlenecked by memory size and bandwidth. The Key-Value (KV) cache size grows linearly with sequence length and needs...

arxivpapers