Research2026-04-28
Test of Time: Rethinking Temporal Signal of Benchmark Contamination
Source: Arxiv CS.AI
arXiv:2509.00072v3 Announce Type: replace Abstract: Post-cutoff performance decay has been widely interpreted as a temporal signal for benchmark contamination. We critically examine this belief and demonstrate that this temporal signal is highly sensitive to how benchmark questions are constructed....
arxivpapersbenchmark