BeClaude
Research2026-06-24

Critique of Agent Model

Source: Arxiv CS.AI

arXiv:2606.23991v1 Announce Type: new Abstract: What is an agent? What constitutes agency? With the rise of Large Language Model (LLM) systems marketed as ``coding agents'', ``AI co-scientists'', and other ``agentic" tools that promise to drive up productivity, and at the same time, ``existential"...

What Happened

A new preprint on arXiv (2606.23991v1) takes a critical look at the foundational concept of "agency" as applied to LLM-based systems. The paper questions whether the current crop of tools marketed as "coding agents," "AI co-scientists," and other agentic products genuinely possess agency, or whether the term is being stretched beyond its useful meaning. The authors appear to be engaging with both the technical definition of agency—typically involving goal-directed behavior, autonomy, and environmental interaction—and the practical implications of mislabeling LLM outputs as truly agentic.

This is not a critique of LLM capabilities per se, but of the conceptual framework used to describe them. The paper likely argues that many so-called agents are better understood as sophisticated next-token predictors with tool-use capabilities, not entities that form independent goals or exercise genuine autonomy.

Why It Matters

The term "agent" carries significant weight in AI research and product marketing. When a tool is called an "agent," users and developers implicitly assume certain properties: the ability to pursue goals persistently, recover from errors, and make context-sensitive decisions without human intervention. If the research community accepts that current LLM systems do not meet these criteria, then the entire product category of "AI agents" is built on a conceptual error.

This matters for three reasons:

  • Expectation management: Companies are selling "agentic" tools at premium prices, promising autonomous task completion. If the underlying systems lack true agency, users will face reliability issues, unexpected failures, and the need for constant oversight—undermining the promised productivity gains.
  • Safety and alignment: True agency implies the capacity to form and pursue goals independently. If we incorrectly attribute agency to LLMs, we may misallocate safety resources—focusing on "agent alignment" when the real risks are simpler (e.g., hallucination, prompt injection, or tool misuse).
  • Research direction: Funding and research effort follow terminology. If "agency" is misapplied, labs may chase the wrong technical challenges, such as improving "goal-directed behavior" when the bottleneck is actually in reasoning, memory, or grounding.

Implications for AI Practitioners

For engineers building LLM-based systems, this critique is a practical warning. If you are designing a "coding agent" or "research assistant," ask whether your system actually exhibits agency or merely chains together tool calls based on a prompt. The difference matters for architecture: a true agent might need persistent state, reward signals, and error recovery loops; a prompt-chaining system needs robust fallbacks, human-in-the-loop checks, and clear failure modes.

Practitioners should also reconsider how they evaluate these systems. If agency is a spectrum rather than a binary property, then benchmarks should measure not just task completion but also autonomy, persistence, and adaptability. Current leaderboards that report pass rates on static tasks tell us little about agentic behavior.

Finally, this paper serves as a reminder that terminology in AI is often borrowed from philosophy and cognitive science. Using terms like "agency" without rigorous definition risks confusing customers, regulators, and even the engineers building the systems.

Key Takeaways

  • The paper challenges the widespread use of "agent" to describe LLM-based tools, arguing that true agency requires properties current systems lack.
  • Mislabeling tools as agents creates inflated expectations, misallocated safety efforts, and distorted research priorities.
  • Practitioners should distinguish between prompt-chaining systems and genuine agents when designing architectures and setting user expectations.
  • The AI field needs clearer, more rigorous definitions of agency to avoid conceptual confusion as agentic tools proliferate.
arxivpapersagents