Research2026-05-12

Beyond Accuracy: Evaluating Strategy Diversity in LLM Mathematical Reasoning

arXiv:2605.09292v1 Announce Type: new Abstract: Large language models now achieve high final-answer accuracy on mathematical reasoning benchmarks, but accuracy alone does not capture reasoning flexibility. We introduce a strategy-level evaluation framework instantiated on 80 AMC 10/12 and AIME...

Read Original Article on Arxiv CS.AI

arxivpapersreasoning