BeClaude
Back to News
Policy2026-04-17

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Source: Arxiv CS.AI

arXiv:2510.09541v3 Announce Type: replace-cross Abstract: Diffusion large language models (dLLMs) are emerging as an efficient alternative to autoregressive models due to their ability to decode multiple tokens in parallel. However, aligning dLLMs with human preferences or task-specific rewards via...

arxivpapersimage-generation