Research2026-04-23

PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning

arXiv:2604.12652v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) can improve the prompt following capability of text-to-image (T2I) models, yet obtaining high-quality reward signals remains challenging: CLIP Score is too coarse-grained, while VLM-based reward models (e.g.,...

Read Original Article on Arxiv CS.AI

arxivpaperspromptingrlvision