Semistructured Data and XML
Prof. Dr. Wolfgang May
Lars Runge, M.Sc.,
Sebastian Schrage, M.Sc.
Date and Time:
- Monday 14-16 ct,
IFI SR 2.101
- Wednesday 10-12 ct,
IFI SR 2.101
- Virtual Meetings: We will use BigBlueButton provided by GWDG;
the rooms/meetings can be entered via StudIP. There also the
recordings can be found (they cannot be exported or edited at all)
Please also read the general and technical information
about DBIS virtual teaching.
Lecture and Exercises mixed (see announcements on this page). There will be non-mandatory
exercise sheets whose solutions will be discussed as parts of the lecture.
Module M.Inf.1141, 4 SWS, 6 ECTS.
The module's home is the MSc studies in
Applied CS. It can also be credited in the BSc studies in Applied CS
(as "Vertiefung Softwaresysteme und Daten"),
and in several other studies:
BSc/MSc Wirtschaftsinformatik, Mathematik (BSc/MSc), Teaching/2-Fach-Bachelor, PhD GAUSS, ...
One of the most important facts that lead to the overall success of XML
is that the "XML world" combines a lot of already known concepts in an
optimal way for coping with a broad spectrum of requirements.
The course will first review some of these preceding (partially even historic)
concepts (network database model, relational databases, object-oriented
databases) and the integration of data and metadata (SchemaSQL). Then,
the idea of "semistructured data" is introduced by showing early
representatives that helped to shape the XML world (F-Logic, OEM).
In the main part, XML is presented as a data model and a markup-meta-language,
and the current languages of the concepts of the XML world are systematically
investigated and applied: DTD, XPath, XQuery, XSLT, XLink, XML Schema,
The lecture uses the geographical sample database "Mondial"
in its XML version for illustrations.
For practical exercises, the XML software is installed in the IFI CIP
The software playground page can be found
the XPath/XQuery/XSLT Web interface is available
The sample code fragments can be found in the CIP pool under
Dates & Topics
- Mon 12.4. No lecture. There is the institute's MSc welcome meeting in the afternoon.
- Wed 14.4. and Mon 19.4.: No SSD/XML meetings
- As stated on information
about DBIS virtual teaching, this year, DBIS will use mainly non-live teaching by pre-recordings;
there will be a small number of live meetings.
- For newcomers, we will link a recording on the technical/online teaching issues here
(cleaned recording from last year)
- Wed 21.4.: live meeting via StudIP->Meetings->SSD-2021-04-21
Administrativa, Overview, ...
The lecture is intended to have three dimensions:
- A typical technology course (XML and its languages) with much practical contents.
Perfect for self-learning and experimenting.
- XML is a perfect example of many computer science concepts (practically: around data,
data exchange and interoperability, theoretical: trees, language design)
By now, most of
you have already practical experience with many of them: ant, maven (XML), JSON (a
form of a much newer "lightweight" reduction of XML).
Learning: Always be aware and analytical of the underlying concepts and structures.
- "History": how did (do, and will do) computer science and IT concepts evolve?
The "XML world" (and many other things, like most basic Web technologies) were
developed as a response to new requirements in the mid/late 1990s.
A very interesting time, where many already existing ideas and experiments were improved
Learning: see how to analyze and question requirements and existing things/software.
Just a quick and dirty introduction to XML as document and data format and
querying it with our Web Interface:
document and data trees [ss16]
(XHTML, like this (self!) document) and data trees (with an introduction to Mondial in XML))
Part I: History, evolution, and comparison of data models until 1995 and requirements for XML
- 26.4.-15.5.: No pre-scheduled live meetings.
Materials for self-studying (videos taken from summer term 2020 as the dates in the filenames indicate):
3.5.: Data Models: about structuring data and the development of query languages.
Slides: the Relational Model
Recording: General concepts
of data models, the relational model (with its querying concepts) as an example data model.
- 5./10.5.: The Requirements for "semistructured data" in the mid 90s, history of data models:
appropriateness of the data model for modeling data and other requirements of the time.
Slides: data models
Recording: the network database model (1960s,
pre-declarative querying) and the object-oriented database model (late 1980s)
Recording: the object-oriented database model
(late 1980s, OQL, OIF, Corba)
- Some references to read about database history (optional):
- 12./17.5.: "History" continued - requirements and academic prototypes of the early 1990s:
extremely powerful, yet syntactically minimal "opening" of SQL to metadata) and
early semistructured data models (Tsimmis/OEM and F-Logic):
Slides: early semistructured data models
Recording:Tsimmis/OEM (part 1)
Recording: Tsimmis (part 2), F-Logic and the situation pre-XML
17th 14:15: Live online meeting
to get some feedback (maybe some ad-hoc polls), answer questions, and to discuss/give a roadmap for the
rest of the lecture.
From then on, the course gets "productive" and continues with XML and the languages of the XML world.
Part II: XML concepts and technology - aspects of XML as a data structure in Computer Science
According to the poll in the meeting, the exam has now been scheduled.
The plan is to have a written exam (if there is only a very small number of participants,
we could switch to oral exams) on Tuesday, August 17th,
somewhere between 9-13h.
Depending on the Corona situation, either as at-home exam (Ilias or download-upload),
or in the E-Learning room (then it will be Ilias-based) or in any other computer pool.
The plan is that you can choose to use your own laptop (Ilias in the browser) or
the E-Learning/Pool computer (for those who do not have a laptop).
The exam ist project-like, i.e., a project description for which XML and DTD have
to be created/completed, an then XPath/XQuery/XSLT are applied, and some text-style
questions on the materials of the lecture.
We will link earlier exams for training on this web page (you can already find them
on the Web pages of the before years, but they do not yet belong "here".)
(Exercise: if you have knowledge about JSON, compare JSON with the concepts discussed in Part I, and
with the following aspects of XML)
- 19.5.: XML: data model, language, DTDs etc.
Slides: XML basics
Recording: XML basics
- 24.5.: according to the schedule, this is a holiday
- 26.5.: XML: DTDs etc. (cont'd)
Recording: DTD, the xmllint tool
- 31.5.: XML parsing ...
Recording: XML parsing, XHTML (and parsing)
Exercise Sheet 1
(XML basics, parsing, grammar aspects)
Part III: Languages of the XML world: XPath, XQuery, XSLT ...
- 2.6.: XPath: navigation and addressing language for XML
Recording: XPath I
- 7.6.: XPath (cont'd)
Recall slide from last time: XML Axes for XPath
XPath position functions (local) with graphics
Recording: XPath II
- Exercise Sheet 2: XPath
If there are questions etc.,
the RocketChat dbis channel can be used
(also participants are encouraged to answer questions from others).
- 9.6.: XPath (cont'd)
Recording: XPath III
Monday, June 14th: 14:15:
Live online meeting No live meeting -
I have an oral SemWeb exam.
Discussion of Exercise Sheet 1:
Solutions to Exercise Sheet 1. If some of the solutions should be
discussed or presented, or you have questions, send me a mail.
XPath (conclusions), XML Query Languages: History/Evolution - XQL, XML-QL
Recording: XPath IV: Conclusions
XPath tree navigation sketch (pdf)
Recording: XQL, XML-QL
Monday, June 21st: 14:15: Live online meeting
to get some feedback, answer questions, and to discuss/give a roadmap for the
rest of the lecture.
Recording: XQuery (I)
Notes on XML query language design (rule metaconcept, index-based eval)
experimenting with XQuery, using saxon from the command line (on
the IFI CIP Pool computers or install it on your own computer)
provides better error messages than the Web Service.
- 23.6.: XQuery (cont'd)
Exercise Sheet 3 (XQuery)
- TO BE EXTENDED
- 16.7.2021 End of lecture period.
The exam will (most probably) be a written exam.
Depending on the situation, classically in presence (paper-based or Ilias-based),
or online-at home (with ILIAS or download-upload).
Current plan: Tuesday, August 17th, somewhere between 9-13h