Research2026-05-12
Explicit Reasoning Makes Better Judges: A Systematic Study on Accuracy, Efficiency, and Robustness
Source: Arxiv CS.AI
arXiv:2509.13332v2 Announce Type: replace Abstract: As Large Language Models (LLMs) are increasingly adopted as automated judges in benchmarking and reward modeling, ensuring their reliability, efficiency, and robustness has become critical. In this work, we present a systematic comparison of...
arxivpapersreasoning