BeClaude
Research2026-04-30

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

Source: Arxiv CS.AI

arXiv:2604.26173v1 Announce Type: cross Abstract: An effective way to scale up test-time compute of large language models is to sample multiple responses and then select the best one, as in Grok Heavy and Gemini Deep Think. Existing selection methods often rely on external reward models, which...

arxivpapers