Research2026-05-12

Generating Leakage-Free Benchmarks for Robust RAG Evaluation

arXiv:2605.08838v1 Announce Type: cross Abstract: Retrieval-augmented generation (RAG) is widely used to augment large language models (LLMs) with external knowledge. However, many benchmark datasets, designed to test RAG performance, comprise many questions that can already be answered from an...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmarkrag