Research2026-07-03

ACID: Action Consistency via Inverse Dynamics for Planning with World Models

Originally published byArxiv CS.AI

arXiv:2607.02403v1 Announce Type: cross Abstract: Decision-time planning with action-conditioned world models has become a popular paradigm for embodied control. However, the standard planning cost judges a candidate solely by how close its predicted terminal state lies to the goal, leaving the...

What Happened

A new preprint titled "ACID: Action Consistency via Inverse Dynamics for Planning with World Models" introduces a fundamental correction to how AI agents plan actions using learned world models. The core problem it addresses is a blind spot in standard decision-time planning: existing methods evaluate action sequences solely based on whether the predicted final state matches the goal, completely ignoring the path of actions taken to get there.

The ACID framework adds an "action consistency" constraint. It uses an inverse dynamics model—a separate neural network trained to predict which action was taken given two consecutive states—to score candidate action sequences. If a planned sequence of actions produces state transitions that the inverse model cannot plausibly reconstruct, that plan is penalized, even if it ends at the correct goal state. This prevents the planner from exploiting unrealistic or physically impossible shortcuts that the world model might hallucinate.

Why It Matters

This work tackles a subtle but critical failure mode in model-based reinforcement learning and planning. World models are imperfect approximations; they can generate "shortcut" trajectories that look good in latent space but correspond to impossible real-world actions. For example, a robot arm planner might find a sequence that teleports the gripper to the goal position, bypassing joint limits or collision constraints, because the world model's dynamics are not perfectly learned.

By enforcing action consistency, ACID makes planning more robust without requiring a perfect world model. The inverse dynamics model acts as a reality check—it learns the actual causal relationship between states and actions from real data, so it can flag implausible transitions. This is analogous to how humans use common sense: we reject a plan that "works on paper" but violates physical intuition.

For embodied AI—robotics, autonomous driving, or manipulation—this is particularly important. These systems cannot afford to execute plans that exploit model errors. ACID offers a computationally lightweight fix: the inverse model is cheap to train and evaluate, and it integrates directly into existing planners like cross-entropy method (CEM) or model predictive control (MPC).

Implications for AI Practitioners

First, practitioners should audit their world models for "shortcut" planning behavior. If your planner consistently finds low-cost plans that fail in the real environment, ACID-style consistency checks are a promising diagnostic and corrective tool.

Second, ACID suggests a broader design principle: pair forward models (predicting next state from current state and action) with inverse models (predicting action from state transitions). This bidirectional consistency provides a natural regularization that improves generalization and reduces hallucination in learned dynamics.

Third, the approach is modular. You can add ACID to existing planning pipelines without retraining the world model. This makes it practical for deployment in systems where retraining is expensive or where the world model is frozen (e.g., from a prior training run).

Finally, ACID hints at a deeper insight: the most reliable plans are those that are explainable in terms of the action space. A plan that passes both forward and inverse consistency checks is more likely to be physically realizable, not just mathematically optimal.

Key Takeaways

ACID corrects a blind spot in planning with world models: it penalizes action sequences that produce implausible state transitions, even if they reach the goal.
The approach uses a lightweight inverse dynamics model to enforce action consistency, making planning more robust to world model imperfections.
For AI practitioners, ACID offers a modular, plug-and-play fix that improves plan reliability without requiring world model retraining.
The work underscores the importance of bidirectional consistency (forward + inverse) as a general principle for trustworthy model-based control.

Read Original Article on Arxiv CS.AI

arxivpapers