Research2026-05-14
Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling
Source: Arxiv CS.AI
arXiv:2605.13801v1 Announce Type: cross Abstract: As generative AI models such as large language models (LLMs) become more pervasive, ensuring the safety, robustness, and overall trustworthiness of these systems is paramount. However, AI is currently facing a reproducibility crisis driven by...
arxivpapers