Institute for Informatics
Georg-August-Universität Göttingen

Databases and Information Systems

Uni Göttingen

CoLogNET Workshop on Logic-based Methods for Information Integration,
Vienna, Austria, August 23, 2003.

A Logic-Based Approach to XML Data Integration with "lazy materialization"

Wolfgang May


XPathLog is a Datalog-like extension of XPath for querying, manipulating and integrating XML data. The querying part extends XPath with binding variables to XML nodes that are "traversed" when evaluating an XPath expression. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database. Special operations for data integration include additional cross links (subelement relationship and attributes) between fragments of the original database, declaring synonym relationships between notions from different sources, and XML element fusion.

As a data manipulation and integration language, XPathLog is originally based on a graph-based, edge-labeled model, called XTreeGraph. The XTreeGraph extends the basic XML data model by modeling multiple overlapping trees, and thus allows for restructuring existing XML trees into a densely connected graph database. XML result trees are then defined as XML tree views by projections from this database. The LoPiX implementation follows a "warehouse approach" where the integrated XTreeGraph is materialized.

In the present talk, a "lazy materialization" strategy is proposed that does not materialize the complete internal database, and that is based on the original XML tree model combined with XLink: Data items that are not (yet) changed are integrated into the internal database as references, represented by XLinks. The approach uses our recent proposal for a logical, transparent data model for XLinked data. Only if a referenced data item is actually modified in course of the integration process, it is (partially) loaded into the database; unchanged fragments of it still being represented in a lazy way by XLinks.

[Slides (postscript)]
[Slides (pdf)]

A journal version has been published in Journal of Applied Logics (2005).