Research2026-04-20

SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

arXiv:2604.12617v2 Announce Type: replace-cross Abstract: The post-training pipeline for diffusion models currently has two stages: supervised fine-tuning (SFT) on curated data and reinforcement learning (RL) with reward models. A fundamental gap separates them. SFT optimizes the denoiser only on...

Read Original Article on Arxiv CS.AI

arxivpapersimage-generation