
   * [[http://​cs.ulb.ac.be/​members/​esteban/​|Esteban Zimányi]]   * [[http://​cs.ulb.ac.be/​members/​esteban/​|Esteban Zimányi]]
   * <​ezimanyi@ulb.ac.be>​   * <​ezimanyi@ulb.ac.be>​
-  * Room SU A 4.115 
 ===== Volume =====

   * Master in Computer Sciences [INFO]   * Master in Computer Sciences [INFO]
   * Erasmus Mundus Master in Big Data Management and Analytics (BDMA)   * Erasmus Mundus Master in Big Data Management and Analytics (BDMA)
 ===== Schedule =====
 +The course is given during the first semester ​
 +  * Lectures on Tuesdays from 2 pm to 4 pm at the room S.UA4.218
 +  * Exercises on Fridays from 2 pm to 4 pm at the room S.UB4.130
 ===== Grading =====
-  * [[https://​dl.dropboxusercontent.com/​u/​5119252/​DW/​2013/​description.pdf|Group project]] (30%)+  * Group project (30%)
   * Written exam (70%)   * Written exam (70%)
     * the exam is open book; notes and books can be used. Laptops and other electronic devices are not allowed.     * the exam is open book; notes and books can be used. Laptops and other electronic devices are not allowed.

 In the course, the main concepts of multidimensional databases will be covered and illustrated using the SQL Server tools. Complimentary to the course, IBM and Teradata will give invited lectures. In the course, the main concepts of multidimensional databases will be covered and illustrated using the SQL Server tools. Complimentary to the course, IBM and Teradata will give invited lectures.
===== Books ===== 
-  * [[https://www.springer.com/​9783642546549|Data Warehouse Systems: Design and Implementation]] ​by Aljeandro ​A. Vaisman and Esteban Zimányi. Springer, ​2014.+  * [[https://link.springer.com/​978-3-662-65167-4|Data Warehouse Systems: Design and Implementation]], second edition, Alejandro ​A. Vaisman and Esteban Zimányi. Springer, ​2022.
   * [[http://​www.morganclaypool.com/​doi/​abs/​10.2200/​s00299ed1v01y201009dtm009|Multidimensional Databases and Data Warehousing]] by Cristian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. Morgan & Claypool Publishers.   * [[http://​www.morganclaypool.com/​doi/​abs/​10.2200/​s00299ed1v01y201009dtm009|Multidimensional Databases and Data Warehousing]] by Cristian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. Morgan & Claypool Publishers.
   * [[http://​www.mcgraw-hill.co.uk/​html/​0071610391.html|Data Warehouse Design: Modern Principles and Methodologies]] by Matteo Golfarelli and Stefano Rizzi. McGraw-Hill,​ 2009   * [[http://​www.mcgraw-hill.co.uk/​html/​0071610391.html|Data Warehouse Design: Modern Principles and Methodologies]] by Matteo Golfarelli and Stefano Rizzi. McGraw-Hill,​ 2009

 ==== Extra books ==== ==== Extra books ====
 The following materials have been used to construct the course material, but are not required reading for the course: The following materials have been used to construct the course material, but are not required reading for the course:
-  * Kimball, Ralph; Margy Ross, Warren Thornthwaite,​ Joy Mundy, Bob Becker (2008). The Data Warehouse Lifecycle Toolkit (2nd ed.). Wiley. 
-  * White and Research Papers ​ 
-    * Two survey papers: [[https://​dl.dropbox.com/​u/​5119252/​DW/​papers/​chaudhuri.pdf|paper 1]],  [[https://​dl.dropbox.com/​u/​5119252/​DW/​papers/​olap.pdf|paper 2]] 
   * [[https://​www.springer.com/​9783540744047|Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications]],​ Elzbieta Malinowski, Esteban Zimányi, Springer, 2008   * [[https://​www.springer.com/​9783540744047|Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications]],​ Elzbieta Malinowski, Esteban Zimányi, Springer, 2008
-  * The Data Warehouse Toolkit2nd Ed.Kimball ​and Ross, Wiley, ​2002 +  * [[https://​www.kimballgroup.com/​data-warehouse-business-intelligence-resources/​books/​data-warehouse-dw-lifecycle-toolkit/​|The Data Warehouse ​Lifecycle ​Toolkit]] (2nd ed.) by Ralph Kimball, Margy Ross, Warren Thornthwaite,​ Joy Mundy, Bob Becker. ​Wiley, ​2008. 
-  * Building the Data Warehouse. ​4th editionInmon, Wiley, ​2005 +  * [[https://​www.kimballgroup.com/​data-warehouse-business-intelligence-resources/​books/​data-warehouse-dw-toolkit/​|The ​Data Warehouse ​Toolkit]] (3rd ed.) by Ralph Kimball and Margy RossWiley2013. 
-  * Data Warehousing Fundamentals For IT Professionals2nd editionPaulraj Ponniah, Wiley, ​2010+  * [[https://​www.kimballgroup.com/​data-warehouse-business-intelligence-resources/​books/​data-warehouse-dw-etl-toolkit/​|The Data Warehouse ETL Toolkit]] by Ralph Kimball and Joe Caserta. ​Wiley, ​2004. 
 +  * [[https://​www.wiley.com/​en-be/​Building+the+Data+Warehouse%2C+4th+Edition-p-9780471774235|Building the Data Warehouse]] (4th ed.) by William HInmon, Wiley, ​2005
===== Prerequisites ===== 
-  * Database System Concepts (Sixth Edition) by Silberschatz,​ Korth, and Sudarshan. McGraw-Hill ​(2011) [A copy of the book is available - ask the lecturer]+===== Prerequisites ​===== 
 +  * [[https://​www.mheducation.com/​highered/​product/​database-system-concepts-silberschatz-korth/​M0073523321.html|Database System Concepts]] (6th ed.) by Abraham ​Silberschatz, ​Henri Korth, and S. Sudarshan. McGraw-Hill2011.
     * ER-modeling:​ Chapter 7     * ER-modeling:​ Chapter 7
     * Keys and functional dependencies:​ Section 8.3.1     * Keys and functional dependencies:​ Section 8.3.1
     * BCNF: 8.3.2     * BCNF: 8.3.2
 ===== Course Slides =====
 +  * {{teaching:​infoh419:​dw00-refresher.pdf|Refresher Databases}}
 +  * {{teaching:​infoh419:​dw01-introduction.pdf|Introduction}}
 +    * {{teaching:​infoh419:​database_explosion_report.pdf|Database explosion report}}
 +    * {{teaching:​infoh419:​database_explosion.pdf|Database explosion}}
 +  * {{teaching:​infoh419:​dw02-dfm.pdf|Dimension Fact Model}}
 +  * {{teaching:​infoh419:​dw03-logicalmodel.pdf|Logical Model}}
 +  * {{teaching:​infoh419:​dw04-dimensionchanges.pdf|Dimension Changes}}
 +  * {{teaching:​infoh419:​dw05-etl.pdf|ETL}}
 +  * {{teaching:​infoh419:​dw06-viewmaterialization.pdf|View Materialization}}
 +  * {{teaching:​infoh419:​dw07-indexing.pdf|Indexing}}
 +  * {{teaching:​infoh419:​dw08-aggregatecomputation.pdf|Aggregate Computation}}
 +  * {{teaching:​infoh419:​dw09-conclusion.pdf|Conclusion}} ​
 ===== Software =====
-  * For the exercises we use the SQLServer tools: MS SQLServer, SS Intergration Services, SS Analysis services, and SS Reporting Services\\ 
-=== Extra Resources ​=== +All software used in the course is available in the computer labs. Students who wish a personal copy of the software on their own computers, can get free copies of the software. Succinct instructions to acquire the software have been included below; in case additional help is required you can contact the sysadmin of the department: Arthur Lesuisse <​alesuiss@ulb.ac.be>​ 
-[[http://​www.teradatauniversitynetwork.com/|Teradata University Network]] For the 2013-2014 academic year, the student password ​is '​UnifiedDataArchitecture'​+ 
 +  * MS SQL Server Tools: can be downloaded for free from http://​www.academicshop.be/​msdnaa/​ Register on this page with your ULB email address, and '​order'​ the free msdnaa. After verification you receive login credentials to download quite a few software packages for free. Select the SQL Server 2014 Enterprise edition. 
 +  * Indyco Builder can be downloaded from http://​www.indyco.com/​ . License keys for all students will be added soon. 
 +===== Exercises ​===== 
 +  * [[teaching:​infoh419:​TP|Exercices Web page]] 
 +===== Group Project ===== 
 +[[http://​www.tpc.org|TPC]] is a non-profit corporation that defines transaction processing and database benchmarks and disseminates objective, verifiable TPC performance data to the industry. Regarding data warehouses, two TPC benchmarks are relevant: 
 +  * [[http://​www.tpc.org/​tpcds/|TPC-DS]], the Decision Support Benchmark, which models the decision support functions of a retail product supplier.  
 +  * [[http://​www.tpc.org/​tpcdi/​|TPC-DI]],​ the Data Integration Support Benchmark, which models a typical ETL process that loads a data warehouse. 
 +The project of the course consist of 2 parts: 
 +  * Part I: Implement the TPC-DS benchmark (deadline 1/​11/​2021) 
 +  * Part II: Implement the TPC-DI benchmark (deadline 24/​12/​2021) 
 +You have free choice to use the tools on which the two benchmarks will be implemented. ​For example, ​the TPC-DS benchmark could be implemented on SQL Server Analysis Services, Pentaho Analysis Services (aka Mondrian), etc. Similarly, the TPC-DI benchmark could be implemented on SQL Server Integration Services, Pentaho Data Integration,​ Talend Data Studio, SQL scripts, etc., which then load the data warehouse on a DBMS such as SQL Server, Oracle, PostgreSQL, etc.  
 +Furthermore,​ both benchmarks must be implemented with several scale factors, which determine the size of the resulting data warehouse. You DO NOT need to use the scale factors mentioned in the TPC requirements. The pedagogical objectives aimed at is that you learn how to properly perform a benchmark. Therefore, you need to estimate the biggest scale factor that you can put on your own computer: this will be your reference scale factor, say 1.0, and then you will need to have 3 smaller scale factors, e.g., at 0.1, 0.2, and 0.5 of the full size in order to see the evolution of the performance. 
 +The project is carried out in groups of 3-4 persons, which will be the same for the two parts. Before you can submit part I of the project, you will have to register in a group. For this, please send an email to the lecturer with the information about your group by 1/10/2020 at the latest. The submission deadlines for parts I and II are strict. 
 +The deliverables expected for each part of the project are the following:​ 
 +  * A report in pdf explaining the essential aspects of your implementation,​ and 
 +  * A zip file containing the code of your implementation,​ with all necessary instructions to be able to replicate your implementation by the lecturer in standard computing infrastructure. 
 +The project evaluation will count for 30% of your total grade. This may seem undervalued,​ however, putting effort in the project will definitely help you in achieving a better understanding of the course material which will result in a better score in the paper exam which amounts for 70% of the grade. 
 +===== Groups of the current year ===== 
 +  * SQL Server: Nicole Zafalón, Diogo Rapas, Andrés Espinal, Adam Broniewski 
 +  * PostgreSQL: Niccolò Morabito, CHUN HAN LI, Víctor Diví, Filip Sotiroski 
 +  * mySQL: Valada kylynnyk, Yanjian Zhang, Zhicheng Lou, Kainaat Amjid  
 +  * Oracle: El Achouchi Iliass, Belgada Wassim, Ajouaou Soufiane 
 +  * SQLite: Laamiri Achraf, Mareghni Nidhal, Kuete Kamta Frank Jordan 
 +  * mariadb: Tejaswini Dhupad, Himanshu Choudhary, Kamdem Tagne Thomas Borel, Sergio Postigo 
 +  * Spark SQL: Yi Wu, Hang Yu, Zhiyang Guo, Mohammad Zain Abbas 
 +  * DB2/​Airflow:​ Md Jamiur Rahman Rifat, Khushnur Binte Jahangir, Asha Said Seif, Pietro Ferrazzi 
 +  * Microsoft Azure SQL: Davide Rendina, Marita Hernandez, Luiz Fonseca, Zyrako Musaj 
 +  * Citus: Nazgul K. Rakhimzhanova⁩,​ Mohammad Ismail Tirmizi, Maël Touret, Wassim Kezai 
 +  * AWS Aurora: Hind Bakkali, Gaëlle Frauenkron, Mahmut Asım Onat, Salma Salmani 
 +  * Google BigQuery: Soufian El Bakkali Tamara, Maciej Piekarski, David Silberwasser,​ Sami Abdul Sater 
 +  * Impala: Yahya Bakkali, Amirmohammad Fallahi, Maxime Hauwaert, Alexandre Libert 
 +===== Examinations from Previous Years ===== 
 +  * Academic year 2016-2017 
 +    * {{:​teaching:​infoh419:​dw-exam-2017-january-solution.pdf|First session}} 
 +  * Academic year 2015-2016 
 +    * {{:​teaching:​infoh419:​dw-exam-2016-january.pdf|First session}} 
 +  * Academic year 2014-2015 
 +    * {{:​teaching:​infoh419:​dw-exam-2015-january.pdf|First session}} 
 +  * Academic year 2013-2014 
 +    * {{:​teaching:​infoh419:​dw-exam-2014-january-solution.pdf|First session}} 
 +    * {{:​teaching:​infoh419:​dw-exam-2014-july.pdf|Second session}} 
 +  * Academic year 2012-2013 
 +    * {{:​teaching:​infoh419:​dw-exam-2013-january-solution.pdf|First session}} 
 +    * {{:​teaching:​infoh419:​dw-exam-2013-july.pdf|Second session}} 
