Research2026-05-14
Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism
Source: Arxiv CS.AI
arXiv:2605.12524v1 Announce Type: cross Abstract: We introduce ProofGrid, a benchmark suite for evaluating LLM reasoning through machine-checkable proofs rather than final answers alone. ProofGrid contains 15 tasks spanning proof writing, proof checking, proof masking, and proof gap-filling. Tasks...
arxivpapersreasoning