Research2026-04-22

GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines

arXiv:2508.06226v2 Announce Type: replace Abstract: Geometry problem solving (GPS) poses significant challenges for Multimodal Large Language Models (MLLMs) in diagram comprehension, knowledge application, long-step reasoning, and auxiliary line construction. However, current benchmarks lack...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmark