Research2026-04-28

CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning

arXiv:2601.14952v2 Announce Type: replace-cross Abstract: While large language models now handle million-token contexts, their capacity for reasoning across entire document repositories remains largely untested. Existing benchmarks are inadequate, as they are mostly limited to single long texts or...

Read Original Article on Arxiv CS.AI

arxivpapersreasoningbenchmark