Skip to content
BeClaude
Research2026-07-01

One Retrieval to Cover Them All: Co-occurrence-Aware Knowledge Base Reorganization for Session-Level RAG

Originally published byArxiv CS.AI

arXiv:2606.31156v1 Announce Type: cross Abstract: RAG systems retrieve documents optimized for answering one query at a time. Yet enterprise users arrive with sessions, that is, coherent episodes of related questions that span semantically distant parts of the knowledge base. We show that a single...

What Happened

A new preprint (arXiv:2606.31156v1) tackles a fundamental blind spot in current Retrieval-Augmented Generation (RAG) systems: they are designed for single, isolated queries, not for the multi-turn, thematically coherent sessions that characterize real enterprise use. The authors propose reorganizing knowledge bases around co-occurrence patterns — essentially pre-computing which documents tend to be retrieved together across a session of related questions — so that a single retrieval pass can serve an entire conversational episode, rather than requiring separate lookups for each turn.

The core innovation is a reorganization step that clusters or re-indexes documents based on their historical co-retrieval frequency. When a user begins a session, the system retrieves a compact, session-aware subset of the knowledge base, then answers all subsequent questions within that session from this pre-loaded context. This shifts the retrieval burden from per-query latency to a one-time session initialization.

Why It Matters

This work addresses a practical pain point that has been largely ignored in the RAG literature. Current systems treat each query as independent, leading to three problems:

  • Redundant retrieval costs — the same or overlapping documents are fetched repeatedly across related turns, wasting API calls and compute.
  • Context fragmentation — each retrieval may pull from different parts of the knowledge base, making it hard for the LLM to maintain coherent reasoning across the session.
  • Latency accumulation — in enterprise settings with long user sessions (e.g., troubleshooting, legal research, medical diagnosis), per-query retrieval latency adds up to a poor user experience.
The co-occurrence-aware reorganization is particularly relevant for enterprise RAG deployments where knowledge bases are large, queries are interdependent, and session lengths are long. It suggests a future where RAG systems are not just query-optimized but session-optimized, with retrieval planning happening at the session level rather than the turn level.

Implications for AI Practitioners

For teams building production RAG systems, this work offers a concrete architectural pattern: precompute document co-occurrence matrices from historical query logs, then use them to generate session-specific index shards or retrieval plans. This is not a trivial engineering lift — it requires logging query-to-document mappings across sessions and running offline clustering — but the payoff in reduced latency and improved coherence could be significant.

Practitioners should also note the trade-off: session-level reorganization introduces a cold start problem for new sessions or novel query combinations not seen in historical data. The paper likely addresses this with fallback mechanisms (e.g., reverting to per-query retrieval for out-of-distribution sessions), but this adds complexity.

Finally, this work signals a broader shift in RAG research from retrieval accuracy to retrieval efficiency and coherence over time. For anyone deploying RAG in customer-facing or high-throughput enterprise settings, session-level retrieval planning should be on the roadmap.

Key Takeaways

  • Current RAG systems are optimized for single queries, not multi-turn sessions, leading to redundant retrieval, context fragmentation, and latency buildup.
  • The proposed solution reorganizes knowledge bases based on co-occurrence patterns so that one retrieval pass serves an entire session.
  • Enterprise practitioners should consider precomputing session-aware index shards from historical query logs to reduce per-turn latency and improve coherence.
  • The approach introduces a cold-start challenge for novel query sequences, requiring fallback mechanisms to maintain robustness.
arxivpapersrag