Research2026-04-23
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
Source: Arxiv CS.AI
arXiv:2604.19769v1 Announce Type: cross Abstract: Key-value (KV) caching is critical for efficient inference in large language models (LLMs), yet its memory footprint scales linearly with context length, resulting in a severe scalability bottleneck. Existing approaches largely treat KV states as...
arxivpapers