Institute for Informatics
Georg-August-Universität Göttingen

Databases and Information Systems

dbis
Uni Göttingen

Information Extraction from the Web with FLORID

Stony Brook, NY, June 14, 1999.

Wolfgang May

Abstract:

FLORID is an implementation of F-Logic by the database group at the University of Freiburg (Germany). In the talk, the Web extension of FLORID is presented. It allows for wrapping, restructuring and integrating data from the Web, in a unified framework by using F-Logic rules as unique language for programming and querying. The object-oriented Web Model is based on the classes url and webdoc for representing the skeleton of a relevant Web fragment. The intra-document structure is represented by parse-trees which are integrated into the Web skeleton. In the information retrieval task, objects in the extended Web skeleton are identified and restructured into an object-oriented model of the application domain. The wrapping task is done by analyzing the F-Logic representation of the parse-tree and matching with perl regular expressions. The approach is illustrated by a case study which integrates geographical data from different sources.

Structure of the talk:

  • The Web Model: What we can do with FLORID for Web Data Extraction
  • Implementation: How is this implemented in FLORID
  • Practice: How is it used?
  • Demonstration: The Mondial Case Study
  • Lessons we have learnt and further ideas.

[Slides]