Technical Report 131, Institut für Informatik, Universität Freiburg, December 1999.

Information Extraction and Integration with FLORID: The MONDIAL Case Study

Wolfgang May


For accessing and processing this information provided on the Web, there is a need for integration of data from different, heterogeneous sources. Languages for this purpose have to serve for querying the web, extracting information from semistructured data, and restructuring the results. In [LHL+98] we argue that languages supporting deduction and object-orientation are particularly suited in this context; we proposed a formal model for querying structure and contents of Web data. A main advantage of our approach is that it brings together the above-mentioned issues in a unified, formal framework. The approach is implemented in the FLORID system [HLL98] which is an implementation of the deductive object-oriented database language F-Logic [KLW95].

This report substantiates the above claims by a case-study using FLORID: We show how several information sources on the Web containing political and geographical data are integrated to a geographical database using FLORID. The case study illustrates the trade-off gained from an integrated Web-querying and data manipulation language, supporting a concise and elegant programming style. Using a deductive language, a process of rapid prototyping and refinement of the program -- implementing both a wrapper and a mediator -- can be easily followed: the program consists of a skeleton of generic wrapping rules [MHL+99], augmented by refining rules and application-specific rules.