Research2026-05-12
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Source: Arxiv CS.AI
arXiv:2410.14702v2 Announce Type: replace Abstract: Multi-modal Large Language Models (MLLMs) exhibit impressive problem-solving abilities in various domains, but their visual comprehension and abstract reasoning skills remain under-evaluated. To this end, we present PolyMATH, a challenging...
arxivpapersreasoningbenchmark