BeClaude
Research2026-05-06

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong

Source: Arxiv CS.AI

arXiv:2501.09775v3 Announce Type: replace-cross Abstract: Multiple Choice Question (MCQ) tests are among the most used methods for evaluating large language models (LLMs). Besides checking the correctness of the selected answer, evaluations often consider the model's confidence through the...

arxivpapersreasoning