Research2026-05-12
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
Source: Arxiv CS.AI
arXiv:2605.08632v1 Announce Type: cross Abstract: Speculative decoding accelerates Large Language Models (LLMs) inference by using a lightweight draft model to propose candidate tokens that are verified in parallel by the target model. However, existing draft model training objectives are not...
arxivpapers