BeClaude
Research2026-05-12

PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

Source: Arxiv CS.AI

arXiv:2605.08632v1 Announce Type: cross Abstract: Speculative decoding accelerates Large Language Models (LLMs) inference by using a lightweight draft model to propose candidate tokens that are verified in parallel by the target model. However, existing draft model training objectives are not...

arxivpapers