Research2026-07-02

ASPIRE: Agentic /Skills Discovery for Robotics

Originally published byArxiv CS.AI

arXiv:2607.00272v1 Announce Type: cross Abstract: Traditional robot programming is challenging: it requires orchestrating multimodal perception, managing physical contact dynamics, and handling diverse configurations and execution failures. We introduce ASPIRE (Agentic Skill Programming through...

What Happened

Researchers have released a preprint introducing ASPIRE (Agentic Skill Programming through Interactive Robot Execution), a framework that reframes robot programming as an agentic skill discovery process. Rather than requiring engineers to manually code every perception-action loop, ASPIRE enables robots to autonomously discover and compose reusable skills through interactive execution. The system leverages large language models and vision-language models to interpret natural language task descriptions, then generates executable robot programs that handle multimodal perception, contact dynamics, and failure recovery without human intervention at runtime.

The core innovation lies in treating skill acquisition as an emergent property of agentic interaction: the robot explores its environment, attempts tasks, learns from failures, and builds a library of transferable skills that can be combined for novel tasks. This moves beyond traditional approaches that either hardcode behaviors or require extensive demonstration data.

Why It Matters

This work addresses a fundamental bottleneck in robotics: the labor-intensive nature of programming robots for unstructured environments. Traditional robot programming demands expertise across perception, control, and planning, with each new task requiring significant re-engineering. ASPIRE’s agentic approach could dramatically reduce the time and cost of deploying robots in warehouses, homes, or factories.

The shift from "programmed skills" to "discovered skills" has deeper implications. If robots can autonomously build skill libraries through interaction, the path to general-purpose robotics becomes more plausible. Instead of requiring separate models for grasping, pushing, or assembling, a single agentic system could learn these as needed. This mirrors how foundation models in NLP emerged—by scaling interaction rather than manual annotation.

However, the preprint is preliminary. Key questions remain about sample efficiency (how many failures before skill discovery?), safety during exploration, and whether discovered skills generalize across different robot morphologies and environments. The approach also inherits the brittleness of underlying LLMs and VLMs, which can hallucinate or misinterpret physical constraints.

Implications for AI Practitioners

For robotics engineers, ASPIRE suggests a future where the primary task shifts from writing control code to designing reward functions and safety constraints for agentic exploration. Practitioners should monitor how skill libraries are structured and whether they can be shared across robot platforms.

For AI researchers, this work highlights the convergence of embodied AI and foundation models. The key challenge is no longer perception alone but grounding language and reasoning in physical interaction. Practitioners building agentic systems should note that ASPIRE’s failure recovery loop is a critical design pattern: agents that can detect and recover from errors autonomously are far more robust than those requiring human intervention.

For ML engineers, the implication is that skill discovery may become a new benchmark for evaluating agentic systems. Measuring how efficiently an agent builds a reusable skill library from scratch could complement existing benchmarks focused on task completion accuracy.

Key Takeaways

ASPIRE enables robots to autonomously discover and compose skills through interactive execution, reducing manual programming effort for new tasks.
The framework treats skill acquisition as an emergent property of agentic interaction, moving beyond hardcoded behaviors or demonstration-heavy approaches.
Key risks include sample inefficiency, safety during exploration, and reliance on brittle foundation models for physical reasoning.
For AI practitioners, the failure recovery loop and skill library design are the most transferable architectural patterns from this work.

Read Original Article on Arxiv CS.AI

arxivpapersagentsrobotics