Research2026-07-02

Personalization as Inverse Planning: Learning Latent Design Intents for Agentic Slide Generation via Structural Denoising

Originally published byArxiv CS.AI

arXiv:2607.00407v1 Announce Type: new Abstract: Slide design requires personalizing both deck themes and page layouts. Yet, current AI agent-based methods struggle with fine-grained, page-level design. Solely relying on prespecified templates or user verbose instructions, they fail to capture...

What Happened

Researchers have introduced a novel framework for AI-driven slide generation that reframes personalization as an inverse planning problem. The paper, "Personalization as Inverse Planning," proposes that instead of requiring users to provide exhaustive instructions or select from rigid templates, AI agents can infer a user's latent design intent by observing their structural denoising behavior—essentially, how they iteratively refine a slide layout. The system learns to reverse-engineer the design goals behind each editing action, enabling it to generate page-level layouts that align with individual preferences without explicit, verbose commands.

The core technical contribution is a structural denoising model that treats slide generation as a process of recovering an intended design from a noisy or generic starting point. By training on sequences of user edits, the model learns to map observed modifications to underlying design intents (e.g., "emphasize hierarchy" or "balance visual weight"). This allows the agent to proactively propose layouts that match a user's implicit style, rather than merely filling in a template or following step-by-step instructions.

Why It Matters

Current AI slide-generation tools—whether integrated into presentation software or offered as standalone agents—suffer from a fundamental limitation: they either rely on predefined templates that cannot adapt to nuanced user preferences, or they demand that users articulate every design decision in natural language. This creates a friction point where the AI's output feels generic or requires excessive manual correction.

This research matters because it addresses the "intent gap" in creative AI tools. By modeling personalization as inverse planning, the system can capture subtle, page-level design choices that are difficult to express verbally—such as the precise spacing between elements, the relative prominence of text versus visuals, or the use of negative space. For AI practitioners, this represents a shift from instruction-following agents to intent-inferring agents, which could dramatically reduce the cognitive load on users.

The approach also has implications beyond slide generation. The concept of learning latent intents from structural denoising could apply to any domain where users iteratively refine a structured output—such as document formatting, UI design, or even code generation. If validated, this framework could become a general paradigm for personalization in generative AI.

Implications for AI Practitioners

Data requirements: The model requires training on sequences of user edits, not just final outputs. Practitioners should consider logging incremental design actions (e.g., drag-and-drop, resize, recolor) rather than only the completed slide.
Latent space design: The success of this approach hinges on defining a meaningful latent space for design intents. Practitioners need to carefully engineer these representations—too coarse, and personalization is lost; too granular, and the model overfits to noise.
Evaluation challenges: Traditional metrics like FID or CLIP score may not capture whether a generated slide truly reflects a user's unexpressed intent. Practitioners should develop user studies or behavioral metrics (e.g., edit distance between generated and user-refined slides) to validate personalization.
Integration with LLMs: This framework could complement large language models by offloading visual design decisions to a specialized inverse planning model, while the LLM handles content generation and narrative structure.

Key Takeaways

The paper reframes slide personalization as an inverse planning problem, inferring user design intents from structural denoising behavior rather than explicit instructions.
This approach addresses a critical limitation of current AI agents: the inability to capture fine-grained, page-level design preferences without verbose user input.
The framework is potentially generalizable to other domains involving iterative refinement of structured outputs, such as document design or UI prototyping.
Practitioners should focus on collecting edit-sequence data and designing latent intent spaces that balance expressiveness with generalization.

Read Original Article on Arxiv CS.AI

arxivpapersagents