BeClaude
Research2026-05-01

From Test-taking to Cognitive Scaffolding: A Pedagogical Diagnostic Benchmark for LLMs on English Standardized Tests

Source: Arxiv CS.AI

arXiv:2505.17056v2 Announce Type: replace-cross Abstract: As large language models (LLMs) are increasingly integrated into educational tools, current evaluations on standardized tests predominantly focus on binary outcome accuracy. Instead, an effective AI tutor must exhibit faithful reasoning,...

arxivpapersbenchmark