Research2026-05-12
FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models
Source: Arxiv CS.AI
arXiv:2605.10141v1 Announce Type: new Abstract: Recent neural theorem provers use reinforcement learning with verifiable rewards (RLVR), where proof assistants provide binary correctness signals. While verifiable rewards are cheap and scalable without reward hacking issues, they suffer from sparse...
arxivpapersbenchmark