BeClaude
Research2026-05-12

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

Source: Arxiv CS.AI

arXiv:2410.14702v2 Announce Type: replace Abstract: Multi-modal Large Language Models (MLLMs) exhibit impressive problem-solving abilities in various domains, but their visual comprehension and abstract reasoning skills remain under-evaluated. To this end, we present PolyMATH, a challenging...

arxivpapersreasoningbenchmark