Journal of Applied Logic 3, pp. 271-307, 2005.

Logic-based XML data integration: A semi-materializing approach

Wolfgang May


We first describe the approach to XML data integration in the XPathLog/LoPiX project that uses a warehouse strategy. We show that the DOM model and the XML Query Data Model are not suitable for this task since the integrated database is not necessarily a tree, but a set of overlapping (original and integrated) trees. The problem is solved by using a node-labeled graph-based data model, called XTreeGraph, for the internal XML database that represents multiple, overlapping XML trees, or tree views.

In the second part, we return to the standard XML data model - by still keeping the overlapping tree idea by "simulating" it: The data is internally represented by XML where the "overlayed" resulting tree is represented by XLink elements that refer to the original sources. By using a logical, transparent data model for XLinks as investigated in WWW-02, all queries behave as stated against the XTreeGraph. The use of links for partial materialization also turns the approach from a warehouse approach into a mixed approach that combines the advantages of the warehouse approach and of the virtual approach. The approach is again illustrated by using XPathLog as data integration language.

An early version has been presented at CoLogNET Workshop on Logic-based Methods for Information Integration (2003).