This is an old revision of the document!
INFO-H-417 : Database Systems Architecture
The course schedule has changed. Lectures will be taught on friday: 10h-12h (theory) and 14h-15h (exercises) instead of monday from 14h-17h. Please check the online calendar regularly for correct information. The change will be effective from (and including) week 6 (oct 22- oct 26)
In contrast to a typical introductory course in database systems where one learns to design and query relational databases, the goal of this course is to get a fundamental insight into the implementation aspects of database systems. In particular, we take a look under the hood of relational database management systems, with a focus on query and transaction processing. By having an in-depth understanding of the query-optimization-and-execution pipeline, one becomes more proficient in administring DBMSs, and hand-optimizing SQL queries for fast execution.
With respect to query processing, we study the whole workflow of how a typical relational database management system optimizes and executes SQL queries. This entails an in-depth study of: (1) translating the SQL query into a “logical query plan”; (2) optimizing the logical query plan; (3) how each logical operator can be algorithmically implemented on the physical (disk) level, and how secondary-memory index structures can be used to speed up these algorithms; and (4) the translation of the logical query plan into a physical query plan using cost-based plan estimation.
Topics studied in transaction processing include logging, serializability, concurrency control, and their combination.
Organisation
Course Material
Method of Evaluation
Students are evaluated on both a project to be developed during the semester, and a written exam. The project work contributes 6/20 points to the overall score, and the written exam contributes the remaining 14/20 points. Participation in both the project work and the written exam are mandatory requirements for passing the course.
COURSE TRAJECTORY
Lecture 1: Course Introduction and Translation of SQL into the Relational Algebra
During lecture 1 we refreshed the basic background knowledge on relational database management systems (relations, relational algebra, SQL). To re-acquaint yourself with the the relevant background knowledge, you are expected to read thoroughly chapter 1, chapter 2 (only sections 2.2 and 2.4), chapter 5 (only sections 5.1 and 5.2) and 6 from the handbook TCB.
During lecture 1 (
slides), we presented an overview of the architecture of a query compiler (see chapter 16, sections 16.1, 16.3.1 and 16.3.2 in the book) and studied the the translation of SQL into the extended relational algebra (see
course notes for the full translation algorithm).
Lecture 2: Optimization of Logical Query Plans
During lecture 2 (
slides), we studied optimization of the select-project-join expressions (see
course notes for the full optimization algorithm and its correctness). In addition, we discussed popular heuristics for improving general logical query plans: see chapter 16, section 16.3.3 in the book which discusses the simple (but very important) logical optimizations of “pushing” selections and projections; and recognizing joins, based on the algebraic equalities from section 16.2.
You are expected to solve exercises 1.1, 3.1, 4.1, 5, 6, and 8.1 of the
optimization exercises (
solutions) by the lecture of october 1st. You are strongly advised to also try exercises 2, 3, and 4, although these exercises will not be corrected during that lecture. The remaining exercises give extra exercise possibilities.
Reading Assignment 1: Physical data organisation
We take an intermezzo in the compilation of SQL to logical query plans, and consider how a DBMS physically organizes its data on disk. (
slides). The details may be found in chapter 13 of the book and constitute the first reading assignment.
Lecture 3: Index Structures
Lecture 4: More Index Structures
During Lecture 4 (
slides part I,
slides part II) we have studied index structures based on hashing, and discussed the typical architecture of a DBMS. The details are found in section 14.3 and 15.7 in the book.
Lab 1: The I/O Model of Computation and Physical Layout
During Lab 1 (
handouts,
solutions) we looked at the architecture of a database prototype, and in particular at how a relation stored on disk can be read one block at a time. The code for this lab is available as a
git repository.
Project Assignment: External Memory Algorithms
In this assignment you are asked to implement an external-memory merge-sort algorithm (such as the one described in Section 15.4.1 of TCB), and examine its performance under different parameters. As a warm-up exercise you will need to explore several different ways
to read data from, and write data to secondary memory. The overall goal of the assignment is to get real-world experience with the performance of external-memory algorithms. Be sure to read the full assignment for full details.
Small comments to the above assignment, in response to student comments, have been made on 31/10/2012. The changes with respect to the original assignment are marked in blue in the document above.
The assignment should be solved in groups of 2 (if we have an odd number, one group of 3 students will be allowed). You are asked to send, per group, the names of the group members to Mr. Francois Picalausa (fpicalau@ulb.ac.be) by October 22 at the latest. You can check that your group is registered on this page.
If you cannot find a partner, please indicate so by sending an email to Mr. Picalausa, who will hook you up with a partner.
This project contributes 6/20 to the overall grade, and the
written exam contributes the remaining 14/20 points. Your solution should be pushed to the Mercurial repository (see full assignment) no later than 5 january 2013. You get a penalty of -1/6 points for each day that your solution is delayed.
Note that the deadline has been extended from 22 december 2012 to saturday 5 january 2013.
Lecture 5: Multidimensional Index Structures
During lecture 5 (
slides) we have studied various index structures that deal gracefully with multidimensional search keys. The details may be found in sections 14.4-14.6, pages 661-688 of the book.
Lab 2: The BTree Index Structure
During Lab 2 (
handouts,
solutions) we instantiated a BTree in the database prototype that we used during Lab 1, and studied how it can be used to improve query processing. We also delved into the internals of the BTree implementation to understand its layout on disk. The code for this lab is available as a
git repository.
Lecture 6: Physical Operators
During lecture 6 (
slides), we studied the basic algorithms for physical operators, together with their cost analysis. The details (including some algorithms not discussed during the lecture) can be found in chapter 15 (sections 15.1 until 15.6). The interested reader is also advised to read Section 15.7 (although this will not be interrogated during the exam).
You are expected to solve exercise 1 of the
physical operators exercises (
solutions) by the exercise session of November 16. You are strongly advised to also try exercise 2 although that exercises will not be corrected during that response lecture.
Lecture 7: Cost-Based Plan Selection
During lecture 7 (
slides), we studied the problems that occur when translating a logical query plan into an (optimal) physical query plan (i.e., estimating the sizes of subresults and join ordering). The details can be found in Chapter 16 (Sections 16.4 until 16.8).
Lecture 8: Coping with System Failures
During lecture 8
slides we gave a short introduction to transaction processing in general, and dove into coping with systems failures in particular. The details may be found in chapter 17 of the book.
You are expected to solve exercises 17.2.2, 17.2.4, 17.2.5, 17.2.7, 17.3.2, 17.3.3, 17.3.4, 17.3.5, 17.4.2, 17.4.3, 17.4.4, and 17.4.5 in the book by exercise session of Monday, December 3.
Lecture 9: Concurrency control
* During lecture 9 ({{:teaching:infoh417:slides-lect9.pdf|slides}}), we studied the problems that can occur when a database runs many transactions concurrently and the different kinds of scheduling algorithms that can prevent these problems. The details can be found Chapter 18 (section 18.7 is excluded).
You are expected to solve exercises 18.2.5(b,c,e), 18.8.1(b,d), 18.8.3 (a,c), and 18.9.1 (c,f) by the response lecture of Monday, December 3.