Uni Göttingen
Institute for Informatics
Databases and Information Systems


Semistructured Data and XML
Summer 2022

Prof. Dr. Wolfgang May
Lars Runge, M.Sc., Dr. Sebastian Schrage

Date and Time:

  • Monday 14-16 ct, IFI SR 2.101
  • Wednesday 10-12 ct, IFI SR 2.101
  • This year, DBIS will use mainly non-live teaching by pre-recordings. There will be some live online meetings with BigBlueButton provided by GWDG; the rooms/meetings can be entered via StudIP. Maybe, there are also optional meetings in presence at the IFI.
  • Materials for self-studying (in english) will be linked below weekwise:
    • revised videos taken from summer term 2020 (as the "original" dates in the filenames indicate)
    • PDF slides
  • Please also read the general and technical information about DBIS virtual teaching.

Lecture and Exercises mixed (see announcements on this page). There will be non-mandatory exercise sheets whose solutions will be discussed as parts of the lecture.
All materials and announcements can be found HERE on the "blue DBIS pages".

Module M.Inf.1141, 4 SWS, 6 ECTS.
The module's home is the MSc studies in Applied CS. It can also be credited in the BSc studies in Applied CS (as "Vertiefung Softwaresysteme und Daten"),
and in several other studies:
BSc/MSc Wirtschaftsinformatik, Mathematik (BSc/MSc), Digital Humanities, Teaching/2-Fach-Bachelor, PhD GAUSS, ...

Course Description

One of the most important facts that lead to the overall success of XML is that the "XML world" combines a lot of already known concepts in an optimal way for coping with a broad spectrum of requirements. The course will first review some of these preceding (partially even historic) concepts (network database model, relational databases, object-oriented databases) and the integration of data and metadata (SchemaSQL). Then, the idea of "semistructured data" is introduced by showing early representatives that helped to shape the XML world (F-Logic, OEM).

In the main part, XML is presented as a data model and a markup-meta-language, and the current languages of the concepts of the XML world are systematically investigated and applied: DTD, XPath, XQuery, XSLT, XLink, XML Schema, and SQL/XML.

The lecture uses the geographical sample database "Mondial" in its XML version for illustrations.

For practical exercises, the XML software is installed in the IFI CIP Pool. The software playground page can be found here; the XPath/XQuery/XSLT Web interface is available here.
The sample code fragments can be found in the CIP pool under /afs/informatik.uni-goettingen.de/course/xml-lecture/ .

Dates & Topics

  • Mon 18.4. Easter Monday.
  • Tue 19.4. Start of lecture period SS22.
  • Wed 20.4.: No SSD/XML meeting.
  • Mo 25.4.: live meeting via StudIP->SSD-SS22->Meetings->SSD-2022-04-25-intro-meeting
    Administrativa, Overview, ...
    Note: the recording will appear a bit later in StudIP - I forgot to turn the recording off, so it contains other topics that we discussed later in this room. We have to cut the recording first.
    26.4. 19:00: the recording of the meeting is now available.

    The lecture is intended to have three dimensions:

    • A typical technology course (XML and its languages) with much practical contents. Perfect for self-learning and experimenting.
    • XML is a perfect example of many computer science concepts (practically: around data, data exchange and interoperability, theoretical: trees, language design)
      By now, most of you have already practical experience with many of them: ant, maven (XML), JSON (a form of a much newer "lightweight" reduction of XML).
      Learning: Always be aware and analytical of the underlying concepts and structures.
    • "History": how did (do, and will do) computer science and IT concepts evolve?
      The "XML world" (and many other things, like most basic Web technologies) were developed as a response to new requirements in the mid/late 1990s. A very interesting time, where many already existing ideas and experiments were improved and combined.
      Learning: see how to analyze and question requirements and existing things/software.

Part 0: Preface

Part I: History, evolution, and comparison of data models until 1995 and requirements for XML

  • This part is not to be seen as a "technical lecture" to learn details of some languages, but to show how ideas and concepts, in this case, data management and data models (and the concept of high-level declarative query languages) evolved, and how new requirements (Web, data integration, data interoperability, handling documents+data and metadata) lead to XML in the mid/late 1990s.
  • 28.4.-15.5.: No pre-scheduled live meetings.
  • 28.4.: Data Models: about structuring data and the development of query languages.
    Slides: the Relational Model
    Recording: General concepts of data models, the relational model (with its querying concepts) as an example data model.
  • 2./4.5.: The Requirements for "semistructured data" in the mid 90s, history of data models: appropriateness of the data model for modeling data and other requirements of the time.
    Slides: data models
    Recording: the network database model (1960s, pre-declarative querying) and the object-oriented database model (late 1980s)
    Recording: the object-oriented database model (late 1980s, OQL, OIF, Corba)
  • Some references to read about database history (optional):
  • 9./11.5.: "History" continued - requirements and academic prototypes of the early 1990s:
    SchemaSQL (an extremely powerful, yet syntactically minimal "opening" of SQL to metadata) and early semistructured data models (Tsimmis/OEM and F-Logic):
    Slides: early semistructured data models
    Recording: SchemaSQL
    Recording:Tsimmis/OEM (part 1)
    Recording: Tsimmis (part 2), F-Logic and the situation pre-XML
  • Monday, May 16th 14:15: Live online meeting
    to get some feedback, answer questions, and to discuss/give a roadmap for the rest of the lecture.
    From then on, the course gets "productive" and continues with XML and the languages of the XML world.
  • Exam date: August, 23rd, between 9-13h.
    Preferred: with ILIAS in the E-Exams room (MZG "Blue Tower" Central Campus, room 1.116) if the Corona statistics allows it, otherwise as ILIAS exam online.

Part II: XML concepts and technology - aspects of XML as a data structure in Computer Science

    (Exercise: if you have knowledge about JSON, compare JSON with the concepts discussed in Part I, and with the following aspects of XML)

  • 16.5.: XML: data model, language, DTDs etc.
    Slides: XML basics
    Recording: XML basics
  • 18.5.: XML: DTDs etc. (cont'd)
    Recording: DTD, the xmllint tool
  • Exercise Sheet 1
    (XML basics, parsing, grammar aspects, parsing)
    If there are questions etc., the RocketChat dbis channel can be used (also participants are encouraged to answer questions from others).
  • 23.5.: XML parsing ...
    Recording: XML parsing, XHTML (and parsing)

Part III: Languages of the XML world: XPath, XQuery, XSLT ...


The exam will (most probably) be a written/online exam using the ILIAS system. The exam is a "written" exam, carried out using the ILIAS system.
Tuesday, August 23th, 2022, 10:15-12:45 (time details see below)
Online-at home(or @anywhere) with ILIAS. Due to the high Corona incidence we chose not to have it in presence in the E-learning room.

  • The exam is an "open-book-exam", i.e., you can use documentation whatever and whenever you want (but it is intended that you should not need it much, except maybe for looking up syntax - DBIS exams are not "learning" exams, but competence exams).
    Recommended in case of an exam in the E-Learning room without your own computer: print those slides that you want, just to have them and to have a better feeling. Put notes wherever you need them. List keywords and pagenumbers that you need on the first page.
  • Strongly recommended: prepare a "cheat sheet" (German: Spickzettel) where you put everything that you may want to lookup quickly. This preparation also helps to become aware of the material.
    Details of the syntax of the pre-XML history section are not relevant. That section is for understanding the concepts and the problems.
  • Like in a "paper exam", solutions that do maybe not work (completely) can be delivered and will be graded with appropriately partial points.
  • Case "Online Ilias-at-home/anywhere"
    • Official information about online exams: german/ english
    • This time, the process is again a little bit better (as one participant gave me a hint how to be able to go through the whole exam at the beginning like in a synchronous in-presence-exam):
    • We will use the IDENT feature (with photo). Enter IDENT via FlexNow (the IDENT feature opens at 10:15) - you must identify by uploading a photo of your face and your study ID card (and find an information text, basically the same as here [not yet the password]) before entering Ilias. (note: when entering, the IDENT system usually starts in german, so you have to switch manually to english there).
    • Enter StudIP, go to the special "course" "LV-Nr. 990060; MAY - Datum: 23.08.2022, 09:00 - 13:00".
      • In this course, under "Meetings" there is the BBB Meeting for the exam accessible. There, we all meet. The meeting officially begins at 10:30 (we will be there at about 10:10).
        Then, I will go through the whole exam, read it, give some comments and answer first questions - like in a synchronous in-presence-exam. This is expected to take about 15 minutes, until about 10:45.
        After this, the ILIAS password will be published.
      • Also in this course, under "Learning Modules" -> "Course in Ilias" you can then log into ILIAS using the above password.
        From then on, Ilias is open for 120 minutes, ending about 12:45
    • You should have installed xmllint (for XML-DTD validation, it has better error messages than saxon) and saxon (for XQuery and XSLT) -or whatever XML software you want to use- on your computer.
  • Case "In Presence": a computer-based exam (as in the "C programming course" in our BSc) with the ILIAS system in the E-Exams room (Central Campus, building MZG "Blue Tower", room 1.116).
    Not our case in this year. Details commented out.

Communication before and during the exam (online at home scenario)

  • There is a RocketChat private channel (you should have received an invitation).
    Languages in this channel: in general english; also german is allowed.
  • There is a special FlexNow-generated "course" in StudIP "LV-Nr. 990060; MAY - Datum: 23.08.2022, 09:00 - 13:00" (all participants who registered in FlexNow for the MSc module should already be registered automatically) with a single BBB-Meeting "Exam 23.8." where all participants should be there.
    We will first read through the whole exam (10:30) and we can communicate in case of important corrections/hints/questions.
    When you enter, take a microphone (in case it is needed later) and mute it.
    Raise hand, and/or use the above RC (this beeps) in case of a question. You will then be contacted by direct chat, maybe to another BBB room.
  • fallback RC dbis channel.

Exam Preparation

  • you need to design a small, but useful XML instance from the given text (which will contain sample data as in the earlier exams). Note that for queries in the "all x such that for all y ..." style, it is helpful to have such x,y in the instance.
  • have a plan, "experience", how to edit an XML file quickly with copy+pasting elements. This can be much faster than in a paper exam where the chatty XML stuff must be written several times. Choose short element names and attribute names. You may also abbreviate text contents like person names to initials.

Training example exams

The 2022 Exam

  • [27.8. 00:30] The grading of the SSD/XML exam is finished. You should be able to see your grades and the comments to your solutions in Ilias (enter it as before via StudIP). The grades are not yet in FlexNow.
    Grades: passed with 45P or more, 3.7 with 50P ...., every 5P, 1.0 with 90P or more.
  • In case of any questions, there is an open post exam-review meeting (German: Klausureinsicht) on Monday, (Aug 29th, 11:00-12:00) in the same BBB room as we used in the exam (enter it again via StudIPs special exam "LV-Nr. 990060 ..."). A commented reference solution will be provided soon on the lecture's Web page ... tomorrow (=today). You can also state questions in the RocketChat or by mail and we have a meeting later.
  • In the winter term 2022/23, the Praktikum/Lab course XML WS2022/23 will take place (as an online course).
  • Exam 2022 without solutions (in English) (sample data slightly modified)
  • Exam 2022 with solutions (in English) (sample data slightly modified)
  • The grades should now be visible in FlexNow.
    Short statistical overview:
    4x 1.0, 1x 1.7, 1x 2.0, 1x 2.3, 1x3.0, 1x 4.0.