Research2026-05-06

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

arXiv:2605.02888v2 Announce Type: cross Abstract: Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length $\gamma$, which...

Read Original Article on Arxiv CS.AI

arxivpapers