Research2026-05-06

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

arXiv:2509.17677v2 Announce Type: replace Abstract: Large language models (LLMs) have shown strong performance on mathematical reasoning under well-defined conditions. However, real-world engineering problems involve uncertainty, context, and open-ended settings that extend beyond symbolic...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmark