Policy2026-05-14
TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment
Source: Arxiv CS.AI
arXiv:2605.10983v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has shown extraordinary potential in aligning diffusion models to downstream tasks, yet most of them still suffer from significant reward hacking, which degrades generative diversity and quality by inducing visual...
arxivpapersimage-generation