BeClaude
Research2026-04-28

Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting

Source: Arxiv CS.AI

arXiv:2506.19089v5 Announce Type: replace-cross Abstract: We introduce StorySim, a programmable framework for synthetically generating stories to evaluate the theory of mind (ToM) and world modeling (WM) capabilities of large language models (LLMs). Unlike prior benchmarks that may suffer from...

arxivpapersprompting