Research2026-05-06
EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving
Source: Arxiv CS.AI
arXiv:2509.17677v2 Announce Type: replace Abstract: Large language models (LLMs) have shown strong performance on mathematical reasoning under well-defined conditions. However, real-world engineering problems involve uncertainty, context, and open-ended settings that extend beyond symbolic...
arxivpapersbenchmark