BeClaude
Research2026-05-12

FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models

Source: Arxiv CS.AI

arXiv:2605.10141v1 Announce Type: new Abstract: Recent neural theorem provers use reinforcement learning with verifiable rewards (RLVR), where proof assistants provide binary correctness signals. While verifiable rewards are cheap and scalable without reward hacking issues, they suffer from sparse...

arxivpapersbenchmark