BeClaude
Research2026-04-28

CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning

Source: Arxiv CS.AI

arXiv:2601.14952v2 Announce Type: replace-cross Abstract: While large language models now handle million-token contexts, their capacity for reasoning across entire document repositories remains largely untested. Existing benchmarks are inadequate, as they are mostly limited to single long texts or...

arxivpapersreasoningbenchmark