BeClaude
Research2026-05-11

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs

Source: Arxiv CS.AI

arXiv:2605.07806v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence....

arxivpapers