Research2026-05-06

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong

arXiv:2501.09775v3 Announce Type: replace-cross Abstract: Multiple Choice Question (MCQ) tests are among the most used methods for evaluating large language models (LLMs). Besides checking the correctness of the selected answer, evaluations often consider the model's confidence through the...

Read Original Article on Arxiv CS.AI

arxivpapersreasoning