Research2026-05-14
Entropy Aware Reward Guidance for Diffusion Language Model Alignment
Source: Arxiv CS.AI
arXiv:2602.05000v2 Announce Type: replace-cross Abstract: Reward guidance, also known as posterior sampling, is a popular method for test-time adaptation and post-training in continuous diffusion models. In this paper, we study reward guidance for discrete diffusion language models; now, one cannot...
arxivpapersimage-generation