Research2026-05-12
Generating Leakage-Free Benchmarks for Robust RAG Evaluation
Source: Arxiv CS.AI
arXiv:2605.08838v1 Announce Type: cross Abstract: Retrieval-augmented generation (RAG) is widely used to augment large language models (LLMs) with external knowledge. However, many benchmark datasets, designed to test RAG performance, comprise many questions that can already be answered from an...
arxivpapersbenchmarkrag