BeClaude
Research2026-04-28

Test of Time: Rethinking Temporal Signal of Benchmark Contamination

Source: Arxiv CS.AI

arXiv:2509.00072v3 Announce Type: replace Abstract: Post-cutoff performance decay has been widely interpreted as a temporal signal for benchmark contamination. We critically examine this belief and demonstrate that this temporal signal is highly sensitive to how benchmark questions are constructed....

arxivpapersbenchmark