Institute for Informatics
Georg-August-Universität Göttingen

Databases and Information Systems

dbis
Uni Göttingen

Projektseminar
Web Data Integration and Data Management
Summer 2017

Prof. Dr. Wolfgang May may@informatik.uni-goettingen.de
Lars Runge, M.Sc., Sebastian Schrage, M.Sc.

Technical Data

  • Advanced Bachelor or Master/Diploma in Applied Computer Science or Information Systems (Wirtschaftsinformatik)
  • Prerequisites/Vorbedingungen: Basic Knowledge in e.g. XML and/or RDF
  • 6 ECTS
  • Number of participants: max. 10
  • Language: German and english are allowed. Reading of english text/documentation is required.

Contents

There is a lot of data available in the Web and in the Semantic Web. Web data is usually provided in a human-readable form of Web pages (including forms, the so-called Deep Web), while it cannot be processed in a database-style way by users. Data Extraction, e.g. from the CIA World Factbook or from Wikipedia, is thus a neverending "hot topic". Apart from pattern-based approaches, also Natural Language Processing Approaches are used.

The Semantic Web (cf. lecture Semantic Web) makes some attempts to provide, extend and/or annotate Web Data towards a machine-readable way. For this, the RDF data format is used, together with the OWL ontology language for describing metadata.

Form of the Seminar

The intention of the seminar is to get an overview of the state of the art in data integration from the Web and background data management.

For each topic, the following has to be done:

  • a written tutorial-style paper that gives an overview of an approach,
  • evaluate some tools, write a report (installation, functionality, usability, ...) [optionally german or english]
  • prepare an illustrative medium-size case study using one or more tools (optionally: comparatively)
  • a presentation giving the tutorial and showing a demo of how to use it (about 90 minutes incl. discussion; optionally german or english).

Time Schedule

  • first meeting at the beginning of the semester:
    Monday 24.4. 14h c.t. SR 2.101, IFI: First Meeting
    Assignment of topics and papers.
  • May/June: preparation of case studies and presentations, individual meetings
  • Registration/Deregistration in FlexNever is open until 14.7.2017.
  • Presentations:
    • Monday 17.7. starting at 13:30: two talks:
    • Chenfeng Zhu: WebScrapping via Browser Automation with Selenium
    • Mauricio Alberto Torres Silva: The google maps and google street maps APIs.
    • Thursday 20.7. starting at 13:30: two talks:
    • Anurag Sundriyal: Deep Learning
    • Stefan Siemer: Deep Learning - Stanford Approach

Note: Papers can be found via the DBLP http://www.dblp.org (originally, DBLP meant "Databases and Logic Programming", but by now it covers all topics in Computer Science), or simply by searching for the paper title with google (this often yields the pdf directly). A list of other papers of the same authors can then be found via DBLP.