Research2026-05-12

Attention Drift: What Autoregressive Speculative Decoding Models Learn

arXiv:2605.09992v1 Announce Type: cross Abstract: Speculative decoding accelerates LLM inference by drafting future tokens with a small model, but drafter models degrade sharply under template perturbation and long-context inputs. We identify a previously-unreported phenomenon we call...

Read Original Article on Arxiv CS.AI

arxivpapers