Research2026-06-30

SAGA: Scene-Aware, Goal-Evolving Agents for Long-Horizon CivRealm Strategy Planning

Originally published byArxiv CS.AI

arXiv:2606.29932v1 Announce Type: new Abstract: Long-horizon strategic planning in complex strategy games demands concurrent reasoning across multiple decision domains under imperfect information and sparse reward. Existing LLM-based agents suffer from three systematic failures: scene blindness...

What Happened

Researchers have introduced SAGA (Scene-Aware, Goal-Evolving Agents), a novel framework designed to address fundamental weaknesses in LLM-based agents when tackling long-horizon strategic planning in complex environments like the CivRealm game. The paper identifies three persistent failure modes in existing LLM agents: scene blindness (inability to process the full spatial and relational context), goal fixation (rigid adherence to initial objectives despite changing conditions), and sparse reward handling (poor credit assignment across long action sequences).

SAGA’s architecture introduces two key mechanisms. First, a scene-aware perception module that compresses high-dimensional game state information into structured representations the LLM can effectively reason over. Second, a goal evolution module that dynamically adjusts sub-goals based on intermediate feedback and shifting environmental conditions, preventing agents from blindly pursuing outdated strategies. The system was evaluated on the CivRealm benchmark, which simulates the complex, multi-domain decision-making of the Civilization game series.

Why It Matters

This work addresses a critical bottleneck in deploying LLMs for real-world sequential decision-making. Current LLM agents excel at isolated reasoning tasks but fail catastrophically when required to maintain coherent strategies over hundreds or thousands of steps—precisely the scenario faced in robotics, supply chain management, and autonomous systems.

The scene-blindness problem is particularly significant. LLMs process text, not spatial or relational data. When forced to reason about complex environments through flattened text descriptions, they miss interdependencies that humans intuitively grasp. SAGA’s structured perception pipeline offers a template for bridging this gap without requiring multimodal models.

The goal evolution mechanism tackles an equally practical issue: LLMs tend to over-commit to initial plans. In dynamic environments, this leads to brittle behavior. SAGA’s approach of treating goals as mutable, context-dependent variables rather than fixed directives mirrors how human experts actually plan—by continuously re-evaluating priorities against new information.

Implications for AI Practitioners

For developers building LLM-based agents, this research suggests three actionable insights:

First, raw text serialization of environment state is insufficient for complex tasks. Practitioners should invest in structured state encoders that extract task-relevant features before feeding them to the LLM. The overhead is justified by dramatic improvements in reasoning quality.

Second, goal management should be treated as a first-class component, not an afterthought. Implementing explicit goal evolution logic—whether through periodic re-planning triggers or learned adaptation—can prevent agents from wasting compute on obsolete objectives.

Third, the sparse reward problem in long-horizon tasks may be more tractable through architectural changes than through prompt engineering alone. SAGA demonstrates that intermediate goal states can serve as dense reward proxies, enabling better credit assignment without manual reward shaping.

Key Takeaways

SAGA introduces scene-aware perception and goal evolution mechanisms that address three systematic failures in LLM-based strategic agents: scene blindness, goal fixation, and sparse reward handling.
The framework demonstrates that structured state representation and dynamic goal adaptation are critical for long-horizon planning tasks, with implications beyond gaming to robotics and autonomous systems.
Practitioners should move beyond flat text prompts for environment representation and implement explicit goal management systems to improve agent robustness in dynamic settings.
Intermediate goal states can effectively serve as dense reward signals, reducing the credit assignment problem inherent in sparse-reward long-horizon tasks.

Read Original Article on Arxiv CS.AI

arxivpapersagents