Research2026-05-06
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
Source: Arxiv CS.AI
arXiv:2605.02888v2 Announce Type: cross Abstract: Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length $\gamma$, which...
arxivpapers