Research2026-04-22
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
Source: Arxiv CS.AI
arXiv:2506.06211v2 Announce Type: replace-cross Abstract: Puzzlehunts are a genre of complex, multi-step puzzles lacking well-defined problem definitions. In contrast to conventional reasoning benchmarks consisting of tasks with clear instructions and constrained environments, puzzlehunts requires...
arxivpapersreasoningbenchmarkmultimodal