Research2026-07-02

Skills Are Not Islands: Measuring Dependency and Risk in Agent Skill Supply Chains

Originally published byArxiv CS.AI

arXiv:2607.01136v1 Announce Type: cross Abstract: Agent skills package reusable operational knowledge for Large Language Model (LLM) agents, yet as they grow in scope, they become dependency-bearing artifacts whose identities, versions, and provenance remain implicit. This opacity already causes...

The quiet chaos of software dependency management—the infamous "left-pad" incident, the log4j vulnerability—has long been a painful lesson for developers. The AI industry is now staring down the barrel of the same problem, but with higher stakes and less visibility. A new paper from arXiv (2607.01136) explicitly diagnoses this: as we build increasingly complex LLM agent systems, the "skills" we plug into them are becoming opaque, dependency-laden artifacts with no standardized way to track their lineage, version, or risk profile.

What Happened

The research identifies a critical blind spot in the current agent ecosystem. Agent skills—modular blocks of code, prompts, and tool definitions that give LLMs specific capabilities—are proliferating rapidly. However, unlike traditional software packages (e.g., npm or PyPI libraries), these skills lack formal mechanisms for declaring their dependencies, versioning their components, or documenting their provenance. The paper argues that this opacity creates a "supply chain" problem: a skill that appears to perform a simple web search might secretly depend on a specific, unversioned API wrapper, a particular model behavior from a now-deprecated LLM, or a data pipeline that introduces subtle biases. When that underlying dependency shifts or breaks, the agent’s behavior degrades silently, and debugging becomes a forensic nightmare.

Why It Matters

This is not a theoretical future problem; it is happening now. AI practitioners are stitching together agents from open-source repositories, commercial plugins, and custom scripts, often without a manifest file or a lock file. The consequences are threefold:

Reproducibility Collapse: An agent that works perfectly today may fail tomorrow because a skill’s implicit dependency—like a specific version of a vector database client or a particular prompt template—has changed. Without explicit dependency tracking, teams cannot reliably recreate agent behavior.
Security Surface Expansion: Malicious actors can inject compromised skills into shared repositories. Because skill provenance is implicit, a poisoned skill can propagate through an organization’s agent fleet without triggering standard security scans designed for traditional software packages.
Behavioral Drift: Skills often encode assumptions about LLM behavior (e.g., "this model handles chain-of-thought well"). When the underlying model updates, the skill’s effectiveness can change, leading to unpredictable agent outputs that are hard to attribute to the root cause.

Implications for AI Practitioners

For teams building production agent systems, this paper is a warning to adopt software engineering rigor before the cracks become chasms. The immediate practical steps include:

Adopt a Skill Manifest: Treat every skill as a package with a formal manifest (similar to package.json or requirements.txt) that declares its dependencies, version constraints, and provenance metadata.
Implement Lock Files: Pin the exact versions of all transitive dependencies—not just the skill itself, but the models, APIs, and data sources it relies on.
Build Audit Trails: Log which skills were used in which agent runs, with their exact versions, to enable post-hoc debugging and rollback.
Establish Trust Registries: Before integrating a skill from an external source, verify its provenance through cryptographic signatures or curated registries, analogous to how container images are vetted.

Key Takeaways

Agent skills are becoming "dependency-bearing artifacts" without formal versioning or provenance tracking, creating a reproducibility and security crisis.
The lack of explicit dependency manifests means agent behavior can silently degrade when underlying components change, making debugging nearly impossible.
AI practitioners must immediately adopt software supply chain best practices—manifests, lock files, and audit trails—for agent skills to prevent systemic failures.
The industry needs standardized tooling for skill packaging and verification, similar to what npm, pip, and Docker registries provide for traditional software.

Read Original Article on Arxiv CS.AI

arxivpapersagents