Policy2026-04-20

Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models

arXiv:2604.15577v1 Announce Type: cross Abstract: Consider an auto-regressive model that produces outputs x (e.g., answers to questions, molecules) each of which can be summarized by an attribute vector y (e.g., helpfulness vs. harmlessness, or bio-availability vs. lipophilicity). An arbitrary...

Read Original Article on Arxiv CS.AI

arxivpapers