Abstractions of Queries in Ontology-Based Data Access
arXiv:2606.24618v1 Announce Type: new Abstract: In ontology-based data access (OBDA), multiple data sources are integrated via mappings to an ontology. We consider an OBDA setting based on existential rules and the certain answer semantics. We address the recent issue of query abstraction, which...
This research from arXiv tackles a fundamental bottleneck in Ontology-Based Data Access (OBDA) , a paradigm designed to make querying complex, heterogeneous data sources as intuitive as possible. The core challenge is that OBDA systems rely on an ontology—a formal representation of knowledge—and mappings that connect raw data to that ontology. When a user poses a query, the system must "rewrite" it into a form that the underlying databases can execute. This rewriting process is computationally expensive, especially under the "certain answer semantics" (i.e., returning only answers that are true in every possible model of the data).
The paper introduces a formal framework for query abstraction. Instead of rewriting every single user query from scratch, the system learns to identify and store "abstract" representations of recurring query patterns. Think of it as building a library of query templates. When a new query arrives, the system first checks if it matches a known abstraction. If it does, the system can bypass the expensive rewriting step and directly use the pre-computed, optimized execution plan associated with that abstraction.
Why this matters. The primary barrier to deploying OBDA at scale is performance. Real-world data sources are often massive and messy. The rewriting process, while logically sound, can be exponential in complexity. By introducing abstraction, this work moves OBDA from a purely "reactive" system (rewrite every query) to a "proactive" one (learn and reuse query patterns). This is a significant step toward making OBDA systems viable for real-time analytics and interactive dashboards, where sub-second response times are critical. Implications for AI practitioners:- For Knowledge Graph Engineers: This research provides a theoretical justification for building caching and pattern-matching layers into your OBDA stack. If you are building a system that serves many similar queries (e.g., "find all patients with condition X in department Y"), implementing a query abstraction layer could yield dramatic performance gains without changing your underlying ontology or mappings.
- For Data Architects: The work reinforces the value of "schema-on-read" approaches. OBDA allows you to keep your data in its native, often normalized, form while exposing a rich, unified view. This research makes that unified view more performant, reducing the temptation to prematurely materialize or denormalize data into a rigid warehouse.
- For AI Systems Engineers: The concept of "abstraction" here is analogous to query compilation in database systems or kernel caching in operating systems. It suggests a design pattern: identify the "hot path" of queries your AI agents or applications generate, abstract them, and pre-optimize them. This is particularly relevant for Retrieval-Augmented Generation (RAG) systems that need to repeatedly query structured data to ground LLM outputs.
Key Takeaways
- Core Innovation: The paper formalizes a method to identify and reuse abstract patterns of database queries over an ontology, reducing the computational cost of query rewriting.
- Performance Breakthrough: This approach directly addresses the scalability bottleneck of OBDA, moving it closer to viability for real-time, interactive applications.
- Practical Design Pattern: Practitioners should consider implementing a query abstraction or caching layer in their OBDA systems, especially when serving a predictable set of analytical queries.
- Broader Relevance: The principle of "abstraction for reuse" is a powerful design pattern for any AI system that repeatedly queries complex, semantically mapped data sources.