This is an old revision of the document!

INFO-H-419: Data Warehouses

Lecturer

Esteban Zimányi
ezimanyi@ulb.ac.be
Tuesday 2 pm - 4 pm
Friday 4 pm - 6 pm

Volume

Theory 24 h - Exercises 24h - Project 12h
5 ECTS

Study Programme

Master in Computer Science and Engineering [MA-IRIF]
Master in Computer Sciences [INFO]
Erasmus Mundus Master in Big Data Management and Analytics (BDMA)

Schedule

The course is given during the first semester

Lectures on Tuesdays from 2 pm to 4 pm at the room S.UA4.218
Exercises on Fridays from 4 pm to 6 pm at the room S.UB4.130

Grading

Group project (30%)
Written exam (70%)
- the exam is open book; notes and books can be used. Laptops and other electronic devices are not allowed.

Course Summary

Relational and object-oriented databases are mainly suited for operational settings in which there are many small transactions querying and writing to the database. Consistency of the database (in the presence of potentially conflicting transactions) is of utmost importance. Much different is the situation in analytical processing where historical data is analyzed and aggregated in many different ways. Such queries differ significantly from the typical transactional queries in the relational model:

Typically analytical queries touch a larger part of the database and last longer than the transactional queries;
Analytical queries involve aggregations (min, max, avg, …) over large subgroups of the data;
When analyzing data it is convenient to see it as multi-dimensional.

For these reasons, data to be analyzed is typically collected into a data warehouse with Online Analytical Processing support. Online here refers to the fact that the answers to the queries should not take too long to be computed. Collecting the data is often referred to as Extract-Transform-Load (ELT). The data in the data warehouse needs to be organized in a way to enable the analytical queries to be executed efficiently. For the relational model star and snowflake schemes are popular designs. Next to OLAP on top of a relational database (ROLAP), also native OLAP solutions based on multidimensional structures (MOLAP) exist. In order to further improve query answering efficiency, some query results can already be materialized in the database, and new indexing techniques have been developped.

In the course, the main concepts of multidimensional databases will be covered and illustrated using the SQL Server tools. Complimentary to the course, IBM and Teradata will give invited lectures.

Books

Data Warehouse Systems: Design and Implementation by Alejandro A. Vaisman and Esteban Zimányi. Springer, 2014.
Multidimensional Databases and Data Warehousing by Cristian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. Morgan & Claypool Publishers.
Data Warehouse Design: Modern Principles and Methodologies by Matteo Golfarelli and Stefano Rizzi. McGraw-Hill, 2009

Extra books

The following materials have been used to construct the course material, but are not required reading for the course:

Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications, Elzbieta Malinowski, Esteban Zimányi, Springer, 2008
The Data Warehouse Lifecycle Toolkit (2nd ed.) by Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy, Bob Becker. Wiley, 2008.
The Data Warehouse Toolkit (3rd ed.) by Ralph Kimball and Margy Ross. Wiley, 2013.
The Data Warehouse ETL Toolkit by Ralph Kimball and Joe Caserta. Wiley, 2004.
Building the Data Warehouse (4th ed.) by William H. Inmon, Wiley, 2005

Prerequisites

Database System Concepts (6th ed.) by Abraham Silberschatz, Henri Korth, and S. Sudarshan. McGraw-Hill, 2011.
- ER-modeling: Chapter 7
- Keys and functional dependencies: Section 8.3.1
- BCNF: 8.3.2

Course Slides

Software

All software used in the course is available in the computer labs. Students who wish a personal copy of the software on their own computers, can get free copies of the software. Succinct instructions to acquire the software have been included below; in case additional help is required you can contact the sysadmin of the department: Arthur Lesuisse alesuiss@ulb.ac.be

MS SQL Server Tools: can be downloaded for free from http://www.academicshop.be/msdnaa/ Register on this page with your ULB email address, and 'order' the free msdnaa. After verification you receive login credentials to download quite a few software packages for free. Select the SQL Server 2014 Enterprise edition.
Indyco Builder can be downloaded from http://www.indyco.com/ . License keys for all students will be added soon.

Exercises

Exercices Web page

Group assignment

The assignment is carried out in groups of 3 to 4 people. Before you can submit assignment part I, you will have to register in a group. The link to register a group is included below. Please to select your group before or on 25/10/2018.

The assignment consist of 2 parts:

Part I: Create a conceptual model and translate to a logical schema (deadline 15/11/2018)
Part II: (deadline 20/12/2018)
- Creating ETL scripts for updating the database in SSIS,
- Predicting how the size of the data warehouse will grow over time,
- Deploy a data cube on top of the data warehouse and create a report.

Assignment part I will be available on 25/10. For the next parts, assignment II will become available right after the submission deadline of assignment part I. The submission deadlines for parts I and II are strict.

The assignment evaluation will count for 30% of your total grade. This may seem undervalued, however, putting effort in the assignment will definitely help you in achieving a better understanding of the course material which will result in a better score in the paper exam which amounts for 70% of the grade.

Examinations from Previous Years

Academic year 2016-2017
- First session
Academic year 2015-2016
- First session
Academic year 2014-2015
- First session
Academic year 2013-2014
- First session
- Second session
Academic year 2012-2013
- First session
- Second session

Table of Contents