Research2026-04-23
DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories
Source: Arxiv CS.AI
arXiv:2604.20443v1 Announce Type: cross Abstract: Large Language Models (LLMs) have been shown to possess Theory of Mind (ToM) abilities. However, it remains unclear whether this stems from robust reasoning or spurious correlations. We introduce DialToM, a human-verified benchmark built from...
arxivpapersbenchmark