BeClaude
Research2026-05-07

Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs

Source: Arxiv CS.AI

arXiv:2512.09874v2 Announce Type: replace-cross Abstract: Correctly parsing mathematical formulas from PDFs is critical for training large language models and building scientific knowledge bases from academic literature, yet existing benchmarks either exclude formulas entirely or lack...

arxivpapersbenchmark