Research2026-05-11
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks
Source: Arxiv CS.AI
arXiv:2603.24755v2 Announce Type: replace-cross Abstract: Software development is iterative, yet agentic coding benchmarks hide design issues through their single-shot setup. Recent iterative benchmarks attempt to remedy this but heavily constrain an agent's design decision space, making it...
arxivpapersagentsbenchmark