Research2026-05-06
From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level
Source: Arxiv CS.AI
arXiv:2601.03731v3 Announce Type: replace-cross Abstract: As large language models (LLMs) evolve into autonomous agents, evaluating repository-level reasoning, the ability to maintain logical consistency across massive, real-world, interdependent file systems, has become critical. Current...
arxivpapersreasoningagentsbenchmark