Research2026-04-23

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

arXiv:2604.20443v1 Announce Type: cross Abstract: Large Language Models (LLMs) have been shown to possess Theory of Mind (ToM) abilities. However, it remains unclear whether this stems from robust reasoning or spurious correlations. We introduce DialToM, a human-verified benchmark built from...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmark