INFO-H-417 : Database Systems Architecture

The written exam of INFOH417 is planned on monday, 16 january 2017 from 14h00-17h30 in room S.UB5.230. The exam is closed book. As mentioned in class; you are allowed to use a calculator, provided that this is a bare-bones calculator without programming capabilities. If this is a programmable calculator, you are only allowed to use it after resetting its memory on site. Use of smartphones or other devices with communication capabilities are not allowed.

The written exam of INFOH417 is planned on monday, 16 january 2017 from 14h00-17h30 in room S.UB5.230. The exam is closed book.

To help you in preparing for the exam, two prior exams are now available: prior exam 1 and prior exam 2

The project gropus are posted (see project page)

GENERAL INFORMATION

In contrast to a typical introductory course in database systems where one learns to design and query relational databases, the goal of this course is to get a fundamental insight into the implementation aspects of systems designed to manage and process large amounts of data. Our objective in this respect is two-fold. (1) To gain the background required to design and implement future data management and processing systems and (2) to gain an understanding of how performance of practical data management systems can be tweaked.

In particular, we take a look under the hood of relational database management systems, with a focus on query and transaction processing. The focus on relational database management systems is motivated by the fact that the algorithms and architectures underlying relational databases have strongly influenced the design of contemporary data processing and management systems: graph databases, in-memory database systems, stream databases, and even NoSQL systems.

With respect to query processing, we study the whole workflow of how a typical relational database management system optimizes and executes SQL queries. This entails an in-depth study of: (1) translating the SQL query into a “logical query plan”; (2) optimizing the logical query plan; (3) how each logical operator can be algorithmically implemented on the physical (disk) level, and how secondary-memory index structures can be used to speed up these algorithms; and (4) the translation of the logical query plan into a physical query plan using cost-based plan estimation.

With respect to transaction processing we study how a typical relational database management systems ensures recovery from errors and controls concurrent access to the data. Topics studied in transaction processing include logging, serializability, concurrency control, and their combination.

Contacts

Organisation

  • The course is taught during the first semester
  • The course schedule is available on-line
  • The list of competences that will be taught during the course and interrogated during the exam is available in the course plan.

Course Material

The course uses the book "Database Systems: The Complete Book (second, international edition)" by H. Garcia-Molina, J. D. Ullman, and J. Widom (ISBN-13: 978-0131354289), complemented by course notes made available on this website.

Method of Evaluation

Students are evaluated on both a project to be developed during the semester, and a written exam. The project work contributes 6/20 points to the overall score, and the written exam contributes the remaining 14/20 points. Participation in both the project work and the written exam are mandatory requirements for passing the course.

COURSE TRAJECTORY

Lecture 1: Course Introduction and Translation of SQL into the Relational Algebra

  • During lecture 1 we refresh the basic background knowledge on relational database management systems (relations, relational algebra, SQL). To re-acquaint yourself with the relevant background knowledge, you are expected to read thoroughly chapter 1, chapter 2 (only sections 2.2 and 2.4), chapter 5 (only sections 5.1 and 5.2) and 6 from the handbook TCB.
  • During lecture 1 (slides), we present an overview of the architecture of a query compiler (see chapter 16, sections 16.1, 16.3.1 and 16.3.2 in the book) and study the translation of SQL into the extended relational algebra (see course notes for the full translation algorithm).
  • You are expected to solve exercise 1 of the translation exercises (pdf) by the exercise session of friday 30 september. Exercise 2 gives extra exercise possibilities, but will not be corrected in class.
  • The solutions of the translation exercises are available.

Lecture 2: Optimization of Logical Query Plans

  • During lecture 2 (slides), we study optimization of the select-project-join expressions (see course notes for the full optimization algorithm and its correctness). In addition, we discuss popular heuristics for improving general logical query plans: see chapter 16, section 16.3.3 in the book which discusses the simple (but very important) logical optimizations of “pushing” selections and projections; and recognizing joins, based on the algebraic equalities from section 16.2.
  • You are expected to solve exercises 1.1, 3.1, 4.1, 5, 6, and 8.1 of the optimization exercises by the exercise session of friday, october 2. You are strongly advised to also try exercises 2, 3, and 4, although these exercises will not be corrected during that session. The remaining exercises give extra exercise possibilities.
  • The solutions of the optimization exercises are online.

Reading Assignment 1: Physical data organisation

  • We take an intermezzo in the compilation of SQL to logical query plans, and consider how a DBMS physically organizes its data on disk. (slides). The details may be found in chapter 13 of the book and constitute the first reading assignment.

Lecture 3: Index Structures

  • During Lecture 3 (slides part I, slides part II) we study sparse and dense index structures, and BTrees. The details are found in chapter 14, sections 14.1 and 14.2 in the book.

Lecture 4: More Index Structures

Project Assignment: External Memory Algorithms

  • In this assignment you are asked to implement an external-memory merge-sort algorithm, and examine its performance under different parameters. Be sure to visit the project's page for full details and modalities.

Reading Assignment 2 and lecture 5: Multidimensional Index Structures

  • There is no class on friday 28 October. Instead you are asked to read sections 14.4-14.6, pages 661-688 (in the international edition) or pages 649-676 (in the new international edition) of the book. This concerns various index structures that deal gracefully with multidimensional search keys (slides).

Lecture 6: Physical Operators

  • In lecture 6 (slides), we study the basic algorithms for physical operators, together with their cost analysis. The details (including some algorithms not discussed during the lecture) can be found in chapter 15 sections 15.1 until 15.6, pages 701-746 (in the international edition) or pages 689-734 (in the new international edition). The interested reader is also advised to read Section 15.7 (although this will not be interrogated during the exam).

Lecture 7: Cost-Based Plan Selection

  • During lecture 7 (slides), we will study the problems that occur when translating a logical query plan into an (optimal) physical query plan (i.e., estimating the sizes of subresults and join ordering). The details can be found in Chapter 16 (Sections 16.4 until 16.8, pages 792-839 in the international edition or pages 780-827 in the new international edition).

Lecture 8: Coping with System Failures

  • During lecture 8 slides part 1 slides part 1 we gave a short introduction to transaction processing in general, and dove into coping with systems failures in particular. The details may be found in chapter 17 of the book.
  • You are expected to solve exercises 17.2.2, 17.2.4, 17.2.5, 17.2.7, 17.3.2, 17.3.3, 17.3.4, 17.3.5, 17.4.2, 17.4.3, 17.4.4, and 17.4.5 in the book by exercise session of Friday, December 9. The solutions are online.

Lecture 9: Concurrency control

  • During lecture 9 (slides), we studied the problems that can occur when a database runs many transactions concurrently and the different kinds of scheduling algorithms that can prevent these problems. The details can be found Chapter 18 (sections 18.5, 18.6. 18.7 is excluded as is section 18.9 on validation-based schedulers).
  • You are expected to solve exercises 18.2.4(b,c,e), 18.8.1(b,d), 18.8.2 (a,c) by the response lecture of Friday, December 16. Given that we have not studied validation based schedulers, you do not need to prepare exercise 18.9.1 (c,f). The solutions are online.

Revision session

Durig the lecture & exercise session of Friday, december 16 you have the opportunity to ask questions on the theory and exercises. In particular, if you would like to have a certain part of a lecture re-explained, or an exercise re-corrected, then this is possible provided that you let prof. Vansummeren know your question by monday, december 12 at the latest.

This last lecture also features a demo that illustrates how the theory on query optimization that we have studied is applied in practical database management systems.

 
teaching/infoh417.txt · Last modified: 2017/01/12 09:26 by svsummer