BeClaude
Research2026-04-20

SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

Source: Arxiv CS.AI

arXiv:2604.12617v2 Announce Type: replace-cross Abstract: The post-training pipeline for diffusion models currently has two stages: supervised fine-tuning (SFT) on curated data and reinforcement learning (RL) with reward models. A fundamental gap separates them. SFT optimizes the denoiser only on...

arxivpapersimage-generation