BeClaude
Research2026-05-14

Entropy Aware Reward Guidance for Diffusion Language Model Alignment

Source: Arxiv CS.AI

arXiv:2602.05000v2 Announce Type: replace-cross Abstract: Reward guidance, also known as posterior sampling, is a popular method for test-time adaptation and post-training in continuous diffusion models. In this paper, we study reward guidance for discrete diffusion language models; now, one cannot...

arxivpapersimage-generation