BeClaude
Research2026-05-14

Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism

Source: Arxiv CS.AI

arXiv:2605.12524v1 Announce Type: cross Abstract: We introduce ProofGrid, a benchmark suite for evaluating LLM reasoning through machine-checkable proofs rather than final answers alone. ProofGrid contains 15 tasks spanning proof writing, proof checking, proof masking, and proof gap-filling. Tasks...

arxivpapersreasoning