BeClaude
Research2026-05-12

Generating Leakage-Free Benchmarks for Robust RAG Evaluation

Source: Arxiv CS.AI

arXiv:2605.08838v1 Announce Type: cross Abstract: Retrieval-augmented generation (RAG) is widely used to augment large language models (LLMs) with external knowledge. However, many benchmark datasets, designed to test RAG performance, comprise many questions that can already be answered from an...

arxivpapersbenchmarkrag