Institute for Informatics
Georg-August-Universität Göttingen

Databases and Information Systems

dbis
Uni Göttingen

Semistructured Data and XML
Summer 2021

Prof. Dr. Wolfgang May
Lars Runge, M.Sc., Sebastian Schrage, M.Sc.

Date and Time:

  • Monday 14-16 ct, IFI SR 2.101
  • Wednesday 10-12 ct, IFI SR 2.101
  • This year, DBIS will use mainly non-live teaching by pre-recordings. There will be some live online meetings with BigBlueButton provided by GWDG; the rooms/meetings can be entered via StudIP.
  • Materials for self-studying (in english) will be linked below weekwise:
    • revised videos taken from summer term 2020 (as the "original" dates in the filenames indicate),
    • PDF slides
  • Please also read the general and technical information about DBIS virtual teaching.

Lecture and Exercises mixed (see announcements on this page). There will be non-mandatory exercise sheets whose solutions will be discussed as parts of the lecture.
All materials and announcements can be found HERE on the "blue DBIS pages".

Module M.Inf.1141, 4 SWS, 6 ECTS.
The module's home is the MSc studies in Applied CS. It can also be credited in the BSc studies in Applied CS (as "Vertiefung Softwaresysteme und Daten"),
and in several other studies:
BSc/MSc Wirtschaftsinformatik, Mathematik (BSc/MSc), Teaching/2-Fach-Bachelor, PhD GAUSS, ...

Course Description

One of the most important facts that lead to the overall success of XML is that the "XML world" combines a lot of already known concepts in an optimal way for coping with a broad spectrum of requirements. The course will first review some of these preceding (partially even historic) concepts (network database model, relational databases, object-oriented databases) and the integration of data and metadata (SchemaSQL). Then, the idea of "semistructured data" is introduced by showing early representatives that helped to shape the XML world (F-Logic, OEM).

In the main part, XML is presented as a data model and a markup-meta-language, and the current languages of the concepts of the XML world are systematically investigated and applied: DTD, XPath, XQuery, XSLT, XLink, XML Schema, and SQL/XML.

The lecture uses the geographical sample database "Mondial" in its XML version for illustrations.

For practical exercises, the XML software is installed in the IFI CIP Pool. The software playground page can be found here; the XPath/XQuery/XSLT Web interface is available here.
The sample code fragments can be found in the CIP pool under /afs/informatik.uni-goettingen.de/course/xml-lecture/ .

Dates & Topics

  • Mon 12.4. No lecture. There is the institute's MSc welcome meeting in the afternoon.
  • Wed 14.4. and Mon 19.4.: No SSD/XML meetings
  • Wed 21.4.: live meeting via StudIP->Meetings->SSD-2021-04-21
    Administrativa, Overview, ...

    The lecture is intended to have three dimensions:

    • A typical technology course (XML and its languages) with much practical contents. Perfect for self-learning and experimenting.
    • XML is a perfect example of many computer science concepts (practically: around data, data exchange and interoperability, theoretical: trees, language design)
      By now, most of you have already practical experience with many of them: ant, maven (XML), JSON (a form of a much newer "lightweight" reduction of XML).
      Learning: Always be aware and analytical of the underlying concepts and structures.
    • "History": how did (do, and will do) computer science and IT concepts evolve?
      The "XML world" (and many other things, like most basic Web technologies) were developed as a response to new requirements in the mid/late 1990s. A very interesting time, where many already existing ideas and experiments were improved and combined.
      Learning: see how to analyze and question requirements and existing things/software.

    Just a quick and dirty introduction to XML as document and data format and querying it with our Web Interface:
    document and data trees [ss16]
    Recording: Document (XHTML, like this (self!) document) and data trees (with an introduction to Mondial in XML))

Part I: History, evolution, and comparison of data models until 1995 and requirements for XML

  • 26.4.-15.5.: No pre-scheduled live meetings.
    Materials for self-studying (videos taken from summer term 2020 as the dates in the filenames indicate):
  • 26.4./28.4.
  • 3.5.: Data Models: about structuring data and the development of query languages.
    Slides: the Relational Model
    Recording: General concepts of data models, the relational model (with its querying concepts) as an example data model.
  • 5./10.5.: The Requirements for "semistructured data" in the mid 90s, history of data models: appropriateness of the data model for modeling data and other requirements of the time.
    Slides: data models
    Recording: the network database model (1960s, pre-declarative querying) and the object-oriented database model (late 1980s)
    Recording: the object-oriented database model (late 1980s, OQL, OIF, Corba)
  • Some references to read about database history (optional):
  • 12./17.5.: "History" continued - requirements and academic prototypes of the early 1990s:
    SchemaSQL (an extremely powerful, yet syntactically minimal "opening" of SQL to metadata) and early semistructured data models (Tsimmis/OEM and F-Logic):
    Slides: early semistructured data models
    Recording: SchemaSQL
    Recording:Tsimmis/OEM (part 1)
    Recording: Tsimmis (part 2), F-Logic and the situation pre-XML
  • Monday, May 17th 14:15: Live online meeting
    to get some feedback (maybe some ad-hoc polls), answer questions, and to discuss/give a roadmap for the rest of the lecture.
    From then on, the course gets "productive" and continues with XML and the languages of the XML world.

Part II: XML concepts and technology - aspects of XML as a data structure in Computer Science

  • According to the poll in the meeting, the exam has now been scheduled. The plan is to have a written exam (if there is only a very small number of participants, we could switch to oral exams) on Tuesday, August 17th, somewhere between 9-13h.
    Depending on the Corona situation, either as at-home exam (Ilias or download-upload), or in the E-Learning room (then it will be Ilias-based) or in any other computer pool. The plan is that you can choose to use your own laptop (Ilias in the browser) or the E-Learning/Pool computer (for those who do not have a laptop).
    The exam ist project-like, i.e., a project description for which XML and DTD have to be created/completed, an then XPath/XQuery/XSLT are applied, and some text-style questions on the materials of the lecture.
    We will link earlier exams for training on this web page (you can already find them on the Web pages of the before years, but they do not yet belong "here".)
  • (Exercise: if you have knowledge about JSON, compare JSON with the concepts discussed in Part I, and with the following aspects of XML)

  • 19.5.: XML: data model, language, DTDs etc.
    Slides: XML basics
    Recording: XML basics
  • 24.5.: according to the schedule, this is a holiday
  • 26.5.: XML: DTDs etc. (cont'd)
    Recording: DTD, the xmllint tool
    notes
  • 31.5.: XML parsing ...
    Recording: XML parsing, XHTML (and parsing)
    Exercise Sheet 1
    (XML basics, parsing, grammar aspects)

Part III: Languages of the XML world: XPath, XQuery, XSLT ...

Exams

The exam is a "written" exam, carried out using the ILIAS system.
Tuesday, August 17th, 2021, 10:45-13:15 (time details see below)
You can individually choose between two variants: Online-at home(or @anywhere) with ILIAS, or if the Corona incidence is not too high, in presence in the E-learning room with your computer or with Ilias-computers (and with one or two of us).

  • The exam is an "open-book-exam", i.e., you can use documentation whatever and whenever you want (but it is intended that you should not need it much, except maybe for looking up syntax - DBIS exams are not "learning" exams, but competence exams).
    Recommended in case of an exam in the E-Learning room without your own computer: print those slides that you want, just to have them and to have a better feeling. Put notes wherever you need them. List keywords and pagenumbers that you need on the first page.
  • Strongly recommended: prepare a "cheat sheet" (German: Spickzettel) where you put everything that you may want to lookup quickly. This preparation also helps to become aware of the material.
    Details of the syntax of the pre-XML history section are not relevant. That section is for understanding the concepts and the problems.
  • Like in a "paper exam", solutions that do maybe not work (completely) can be delivered and will be graded with appropriately partial points.
  • PREFERRED    Case "Online Ilias-at-home/anywhere"
    • Official information about online exams: german/ english
    • We will use the IDENT feature (with photo). Enter IDENT via FlexNow (the IDENT feature opens at 10:55) - you must identify (and find the PASSWORD for Ilias there) before entering Ilias. (note: when entering, the IDENT system usually starts in german, so you have to switch manually to english there).
    • Enter the Ilias system via StudIP: go in StudIP to the special "course" in StudIP "LV-Nr. 990060; MAY - Datum: 17.08.2021, ca. 09:00 - 13:00" to "Learning Modules" -> "Course in Ilias" (when moving from StudIP to Ilias, you need the above-mentioned PASSWORD from IDENT). In this course unter "Meetings" there is the BBB Meeting (see below) also acessible.
      Ilias is open from 11:00 until 13:15; during this interval, everybody has an individual working time of 120 minutes. Means, there is no synchronous starting procedure by us (I am sorry for that, because usually the exams started with once reading through the whole material, where some aspects can be emphasized better than in written material).
    • In that case you should have installed xmllint (for XML-DTD validation, it has better error messages than saxon) and saxon (for XQuery and XSLT) - or whatever XML software you want to use- on your computer.
  • NOT PREFERRED Case "In Presence": a computer-based exam (as in the "C programming course" in our BSc) with the ILIAS system in the E-Exams room (Central Campus, building MZG "Blue Tower", room 1.116).
    This variant is again split into two subvariants:
    1. You can bring and use your own laptop, then it is like "online-at-home" (using Ilias in your browser, and all software, editor etc that you like), using the WLAN there, only with the difference that communication with us might be easier.
    2. Use the computers (maybe these are special laptops prepared) with a predefined "safe exam browser" Ilias setting (based on Windows):
      • xmllint and saxon are installed there (note that xmllint provides better support for the debugging of DTDs than saxon's validation). Both are called via Command-Line (cmd), as usual for the E-exams under Windows. For this and for the editor (Notepad++) buttons have been added to the "Safe Exam Browser" that is used in e-exams.
      • Video about handling Notepad++
      • For the SSD/XML exam, the english language variants of Notepad++ and ILIAS are activated, but you can switch this software to german.
      • Notepad++ provides a restricted syntax highlighting for XML and completion, it should recognize this by the file extension "bla.xml".
      • In the E-Exams room, the keyboards have GERMAN layout. The symbols [,],{,},\ etc. depicted on the lower right of the keys are invoked with "AltGr" and the respective key.
      • TO BE CHECKED: INTERNATIONAL KEYBOARDS -- no.
        Each participant can switch the keyboard layout (software) individually. Note that the hardware will then still show QWERTZ...
      • Technics:
        • The (Windows!) screen then shows 3 windows: photo 1, photo 2. You can move and resize them (all not very surprising):
        • (1) (white) the Windows Notepad++ editor.
        • (2) (black) Windows command shell, in working directory "...\Desktop\saxon". On the first picture, you see (hard to read) that this was prepared for testing with mondial-europe.xml, mondial.dtd, a stylesheet and a query file.
          For the exam, it contains (pre-prepared) files exam.xml (with the XML prologue and reference to exam.dtd, so you don't have to learn this syntax stuff), exam.dtd (empty), exam.xsl (with a <xsl:stylesheet xmlns:xsl="..."> ... </xsl:stylesheet> frame).
          The saxon.jar is there, and executable files xmllint.bat, (saxonValid.bat), saxonXQ.bat and saxonXSL.bat are given.
        • (3) (which shows the place number "53" there) the "safe exam browser" where the "ILIAS" exam system runs.
          When you arrive on your place, do not yet log in. This will be done for all at the same time.
          After opening it, you can resize the answer fields.
        • This general E-exam info sheet gives some abstract overview. It will be there at the exam. (same in German)
        • internet access is disabled in the "safe exam browser" used in the E-exams room. In general, only very restricted applications are available. Basically, there are the three above-mentioned windows. Note that there is only one Notepad++ possible, but you can open and see several files in subwindows (tabs) or windows (e.g. XML and DTD), as demonstrated in this Video.
        • work on the files exam.xml, exam.dtd, query1.xq, ..., exam.xsl using the editor, call saxon and xmllint in the command shell, and at the end copy the files into the ILIAS answer fields.
          This is, what we finally get for grading it.
      • (General) information about Exams in the E-Learning Room (click upper right for German language)
      • organizational procedure:
        • E-Exams room: Central Campus, building MZG "Blue Tower", room 1.116
        • use the big stairs in the entrance hall, there, the waiting zones and procedure are explained
        • everybody will receive shortly before the exam a user name/number and a password.
        • when you enter the room, there is an identity check booth where you have to "check in". Be there at about 10:45.
        • wear a mask until you sit down at your place. Note that only healthy persons are allowed to participate.
      • Exam:
        • The teaching staff sits in the middle of the room, and we see all participants from behind. So it is possible to contact us by raising the hand and maybe a short "hello!".
        • We plan not to come to your places, but there is a chat functionality, but only after raising the hand, because we must then explicitly open the chat channel to a certain person (from a computer inside a glass box, so the chatting person will not be able to see you). It is also possible to share the screen, so in case you have a problem (mainly small, like the typical programming "I don't see why this syntax is wrong"-problems) contact us by raising the hand and/or a short hello (we will see how this works).

Communication before and during the exam

  • There is a RocketChat channel (public, it is not yet provided to configure a Channel only for the participants).
    Languages in this channel: in general english; in urgent cases (e.g. questions during the exam) also german is allowed.
  • New feature: There is a special FlexNow-generated "course" in StudIP "LV-Nr. 990060; MAY - Datum: 17.08.2021, ca. 09:00 - 13:00" (you should be registered automatically) with a single BBB-Meeting "SSD-XML-Exam-17-08-2021" where all participants should be there.
    We will not talk much (as the participation is asynchronous), but in case of important corrections/hints/questions.
    When you enter, take a microphone (in case it is needed later) and mute it.
    Raise hand, and/or use the above RC (this beeps) in case of a question. You will then be taken to a breakout room.

Exam Preparation

  • you need to design a small, but useful XML instance from the given text (which will contain sample data as in the earlier exams). Note that for queries in the "all x such that for all y ..." style, it is helpful to have such x,y in the instance.
  • have a plan, "experience", how to edit an XML file quickly with copy+pasting elements. This can be much faster than in a paper exam where the chatty XML stuff must be written several times. Choose short element names and attribute names. You may also abbreviate text contents like person names to initials.

Training example exams

The 2021 Exam