Research2026-05-01

From Test-taking to Cognitive Scaffolding: A Pedagogical Diagnostic Benchmark for LLMs on English Standardized Tests

arXiv:2505.17056v2 Announce Type: replace-cross Abstract: As large language models (LLMs) are increasingly integrated into educational tools, current evaluations on standardized tests predominantly focus on binary outcome accuracy. Instead, an effective AI tutor must exhibit faithful reasoning,...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmark