Information Extraction from the Web with FLORID

Database and Artificial Intelligence Group at TU Vienna, Austria, November 5, 1999.

Wolfgang May


FLORID is an implementation of F-Logic by the database group at the University of Freiburg (Germany). In the talk, the Web extension of FLORID is presented. It allows for wrapping, restructuring and integrating data from the Web, in a unified framework by using F-Logic rules as unique language for programming and querying. The object-oriented Web Model is based on the classes url and webdoc for representing the skeleton of a relevant Web fragment. The intra-document structure is represented by parse-trees which are integrated into the Web skeleton. In the information retrieval task, objects in the extended Web skeleton are identified and restructured into an object-oriented model of the application domain. The wrapping task is done by analyzing the F-Logic representation of the parse-tree and by matching with perl regular expressions. The approach is illustrated by two case studies.