Research2026-04-30

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

arXiv:2604.26173v1 Announce Type: cross Abstract: An effective way to scale up test-time compute of large language models is to sample multiple responses and then select the best one, as in Grok Heavy and Gemini Deep Think. Existing selection methods often rely on external reward models, which...

Read Original Article on Arxiv CS.AI

arxivpapers