Research2026-05-12

FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models

arXiv:2605.10141v1 Announce Type: new Abstract: Recent neural theorem provers use reinforcement learning with verifiable rewards (RLVR), where proof assistants provide binary correctness signals. While verifiable rewards are cheap and scalable without reward hacking issues, they suffer from sparse...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmark