Research2026-04-23
Coverage, Not Averages: Semantic Stratification for Trustworthy Retrieval Evaluation
Source: Arxiv CS.AI
arXiv:2604.20763v1 Announce Type: cross Abstract: Retrieval quality is the primary bottleneck for accuracy and robustness in retrieval-augmented generation (RAG). Current evaluation relies on heuristically constructed query sets, which introduce a hidden intrinsic bias. We formalize retrieval...
arxivpapersrag