Policy2026-04-20
Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models
Source: Arxiv CS.AI
arXiv:2604.15577v1 Announce Type: cross Abstract: Consider an auto-regressive model that produces outputs x (e.g., answers to questions, molecules) each of which can be summarized by an attribute vector y (e.g., helpfulness vs. harmlessness, or bio-availability vs. lipophilicity). An arbitrary...
arxivpapers