BeClaude
Research2026-04-22

PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts

Source: Arxiv CS.AI

arXiv:2506.06211v2 Announce Type: replace-cross Abstract: Puzzlehunts are a genre of complex, multi-step puzzles lacking well-defined problem definitions. In contrast to conventional reasoning benchmarks consisting of tasks with clear instructions and constrained environments, puzzlehunts requires...

arxivpapersreasoningbenchmarkmultimodal