Research2026-04-28

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference

arXiv:2604.24647v1 Announce Type: cross Abstract: Long-context reasoning is a critical capability of large language models (LLMs), enabling applications such as long-document understanding, summarization, and code generation. However, efficient autoregressive inference relies on the key-value (KV)...

Read Original Article on Arxiv CS.AI

arxivpapers