This is an old revision of the document!
INFO-H-417 : Database Systems Architecture
If you are following this course, it is mandatory that you fill out
this form by
friday, 25 september at the latest. This enables the course instructors to contact you in the future, as well as form an idea on your background.
In contrast to a typical introductory course in database systems where
one learns to design and query relational databases, the goal of this
course is to get a fundamental insight into the implementation aspects
of systems designed to manage and process large amounts of data. Our
objective in this respect is two-fold. (1) To gain the background
required to design and implement future data management and processing
systems and (2) to gain an understanding of how performance of
practical data management systems can be tweaked.
In particular, we take a look under the hood of relational
database management systems, with a focus on query and transaction processing. The focus on relational database management systems is motivated by the fact that the algorithms and architectures underlying relational databases have strongly influenced the design of
contemporary data processing and management systems: graph databases, in-memory database systems, stream databases, and even NoSQL systems.
With respect to query processing, we study the whole workflow of how a
typical relational database management system optimizes and executes
SQL queries. This entails an in-depth study of: (1) translating the SQL query into a “logical query plan”; (2) optimizing the logical query plan; (3) how each logical operator can be algorithmically implemented on the physical (disk) level, and how secondary-memory index structures can be used to speed up these algorithms; and (4) the translation of the logical query plan into a physical query plan using cost-based plan estimation.
With respect to transaction processing we study how a typical
relational database management systems ensures recovery from errors
and controls concurrent access to the data. Topics studied in transaction processing include logging, serializability, concurrency control, and their combination.
-
Assistant:
Stefan Eppe (Campus Solbosh, Building U, 4th floor, room UB4.131)
Organisation
Course Material
Method of Evaluation
Students are evaluated on both a project to be developed during the semester, and a written exam. The project work contributes 6/20 points to the overall score, and the written exam contributes the remaining 14/20 points. Participation in both the project work and the written exam are mandatory requirements for passing the course.
COURSE TRAJECTORY
Lecture 1: Course Introduction and Translation of SQL into the Relational Algebra
During lecture 1 we refresh the basic background knowledge on relational database management systems (relations, relational algebra, SQL). To re-acquaint yourself with the relevant background knowledge, you are expected to read thoroughly chapter 1, chapter 2 (only sections 2.2 and 2.4), chapter 5 (only sections 5.1 and 5.2) and 6 from the handbook TCB.
During lecture 1 (
slides), we present an overview of the architecture of a query compiler (see chapter 16, sections 16.1, 16.3.1 and 16.3.2 in the book) and study the translation of SQL into the extended relational algebra (see
course notes for the full translation algorithm).
You are expected to solve exercise 1 of the
translation exercises (pdf) by the exercise session of friday 25 september. Exercise 2 gives extra exercise possibilities.
The
solutions of the translation exercises are available.
Lecture 2: Optimization of Logical Query Plans
During lecture 2 (
slides), we study optimization of the select-project-join expressions (see
course notes for the full optimization algorithm and its correctness). In addition, *next week* we will discuss popular heuristics for improving general logical query plans: see chapter 16, section 16.3.3 in the book which discusses the simple (but very important) logical optimizations of “pushing” selections and projections; and recognizing joins, based on the algebraic equalities from section 16.2.
You are expected to solve exercises 1.1, 3.1, 4.1, 5, 6, and 8.1 of the
optimization exercises by the exercise session of friday, october 2. You are strongly advised to also try exercises 2, 3, and 4, although these exercises will not be corrected during that session. The remaining exercises give extra exercise possibilities.
The
solutions of the optimization exercises are online.
Reading Assignment 1: Physical data organisation
We take an intermezzo in the compilation of SQL to logical query plans, and consider how a DBMS physically organizes its data on disk. (
slides). The details may be found in chapter 13 of the book and constitute the first reading assignment, which is to be performed instead of the lecture of friday, october 2.
Lecture 3: Index Structures
During Lecture 3 (
slides part I,
slides part II) we study sparse and dense index structures, and BTrees. The details are found in chapter 14, sections 14.1 and 14.2 in the book.
Lecture 4: More Index Structures
Project Assignment: External Memory Algorithms
In this assignment you are asked to implement an external-memory merge-sort algorithm, and examine its performance under different parameters. Be sure to visit the
project's page for full details and modalities.
Lecture 5: Multidimensional Index Structures
During lecture 5 (
slides) we will study various index structures that deal gracefully with multidimensional search keys. The details may be found in sections 14.4-14.6, pages 661-688 (in the international edition) or pages 649-676 (in the new international edition) of the book.
Lab 1: The I/O Model of Computation and Physical Layout
* During Lab 1 (handouts) we looked at the architecture of a database prototype, and in particular at how a relation stored on disk can be read one block at a time. The code for this lab is available as a git repository.