Research2026-05-12
A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering
Source: Arxiv CS.AI
arXiv:2605.08432v1 Announce Type: cross Abstract: Calibration measures whether a model's predicted confidence aligns with its empirical accuracy, and is central to the reliable deployment of large language models (LLMs) in high-stakes domains such as medicine and law. While much recent work focuses...
arxivpapers