Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
teaching:infoh419 [2018/09/17 18:32]
ezimanyi [Group assignment]
teaching:infoh419 [2022/09/19 15:10]
ezimanyi [Grading]
Line 20: Line 20:
  
 The course is given during the first semester ​ The course is given during the first semester ​
-  * Lectures on Tuesdays ​from 2 pm to pm at the room S.UA4.218 +  * Lectures on Mondays ​from 10 am to 12 pm at the room S.K.3.401 
-  * Exercises on Fridays ​from pm to pm at the room S.UB4.130+  * Exercises on Tuesdays ​from pm to pm at the room S.P4.1.17
  
 ===== Grading ===== ===== Grading =====
   * Group project (30%)   * Group project (30%)
   * Written exam (70%)   * Written exam (70%)
-    * the exam is open book; notes and books can be used. Laptops and other electronic devices are not allowed.+    * the exam is open book; notes and books can be used. Laptops and other electronic devices are **not** allowed. Please prepare your paper material in advance.
 ===== Course Summary ===== ===== Course Summary =====
 Relational and object-oriented databases are mainly suited for operational settings in which there are many small transactions querying and writing to the database. Consistency of the database (in the presence of potentially conflicting transactions) is of utmost importance. Much different is the situation in analytical processing where historical data is analyzed and aggregated in many different ways. Such queries differ significantly from the typical transactional queries in the relational model: Relational and object-oriented databases are mainly suited for operational settings in which there are many small transactions querying and writing to the database. Consistency of the database (in the presence of potentially conflicting transactions) is of utmost importance. Much different is the situation in analytical processing where historical data is analyzed and aggregated in many different ways. Such queries differ significantly from the typical transactional queries in the relational model:
Line 38: Line 38:
  
 ===== Books ===== ===== Books =====
-  * [[https://www.springer.com/​9783642546549|Data Warehouse Systems: Design and Implementation]] ​by Alejandro A. Vaisman and Esteban Zimányi. Springer, ​2014.+  * [[https://link.springer.com/​978-3-662-65167-4|Data Warehouse Systems: Design and Implementation]], second edition, ​Alejandro A. Vaisman and Esteban Zimányi. Springer, ​2022.
   * [[http://​www.morganclaypool.com/​doi/​abs/​10.2200/​s00299ed1v01y201009dtm009|Multidimensional Databases and Data Warehousing]] by Cristian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. Morgan & Claypool Publishers.   * [[http://​www.morganclaypool.com/​doi/​abs/​10.2200/​s00299ed1v01y201009dtm009|Multidimensional Databases and Data Warehousing]] by Cristian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. Morgan & Claypool Publishers.
   * [[http://​www.mcgraw-hill.co.uk/​html/​0071610391.html|Data Warehouse Design: Modern Principles and Methodologies]] by Matteo Golfarelli and Stefano Rizzi. McGraw-Hill,​ 2009   * [[http://​www.mcgraw-hill.co.uk/​html/​0071610391.html|Data Warehouse Design: Modern Principles and Methodologies]] by Matteo Golfarelli and Stefano Rizzi. McGraw-Hill,​ 2009
Line 62: Line 62:
   * {{teaching:​infoh419:​dw00-refresher.pdf|Refresher Databases}}   * {{teaching:​infoh419:​dw00-refresher.pdf|Refresher Databases}}
   * {{teaching:​infoh419:​dw01-introduction.pdf|Introduction}}   * {{teaching:​infoh419:​dw01-introduction.pdf|Introduction}}
-  * {{teaching:​infoh419:​dw02-cubes.pdf|Cubes}} 
     * {{teaching:​infoh419:​database_explosion_report.pdf|Database explosion report}}     * {{teaching:​infoh419:​database_explosion_report.pdf|Database explosion report}}
     * {{teaching:​infoh419:​database_explosion.pdf|Database explosion}}     * {{teaching:​infoh419:​database_explosion.pdf|Database explosion}}
-  * {{teaching:​infoh419:​dw03-dfm.pdf|Dimension Fact Model}} +  * {{teaching:​infoh419:​dw02-dfm.pdf|Dimension Fact Model}} 
-  * {{teaching:​infoh419:​dw04-logicalmodel.pdf|Logical Model}} +  * {{teaching:​infoh419:​dw03-logicalmodel.pdf|Logical Model}} 
-  * {{teaching:​infoh419:​dw05-dimensionchanges.pdf|Dimension Changes}} +  * {{teaching:​infoh419:​dw04-dimensionchanges.pdf|Dimension Changes}} 
-  * {{teaching:​infoh419:​dw06-etl.pdf|ETL}} +  * {{teaching:​infoh419:​dw05-etl.pdf|ETL}} 
-  * {{teaching:​infoh419:​dw07-viewmaterialization.pdf|View Materialization}} +  * {{teaching:​infoh419:​dw06-viewmaterialization.pdf|View Materialization}} 
-  * {{teaching:​infoh419:​dw08-indexing.pdf|Indexing}} +  * {{teaching:​infoh419:​dw07-indexing.pdf|Indexing}} 
-  * {{teaching:​infoh419:​dw09-aggregatecomputation.pdf|Aggregate Computation}} +  * {{teaching:​infoh419:​dw08-aggregatecomputation.pdf|Aggregate Computation}} 
-  * {{teaching:​infoh419:​dw10-conclusion.pdf|Conclusion}}+  * {{teaching:​infoh419:​dw09-conclusion.pdf|Conclusion}} ​
  
  
Line 91: Line 90:
 ===== Group Project ===== ===== Group Project =====
  
-The project ​is carried out in groups of 3 to 4 peopleBefore you can submit assignment part Iyou will have to register in a groupFor registering a group send an email to the lecturerPlease to select your group before or on 1/10/2018.+[[http://​www.tpc.org|TPC]] ​is a non-profit corporation that defines transaction processing and database benchmarks and disseminates objective, verifiable TPC performance data to the industryRegarding data warehousestwo TPC benchmarks are relevant: 
 +  * [[http://​www.tpc.org/​tpcds/​|TPC-DS]], ​the Decision Support Benchmark, which models the decision support functions of a retail product supplier 
 +  * [[http://www.tpc.org/​tpcdi/​|TPC-DI]],​ the Data Integration Support Benchmark, which models a typical ETL process that loads a data warehouse.
  
-The project consist of 2 parts:+The project ​of the course ​consist of 2 parts: 
 +  * Part I: Implement the TPC-DS benchmark (deadline 1/​11/​2022) 
 +  * Part II: Implement the TPC-DI benchmark (deadline 24/​12/​2022) 
 +You have free choice to use the tools on which the two benchmarks will be implemented. For example, the TPC-DS benchmark could be implemented on SQL Server Analysis Services, Pentaho Analysis Services (aka Mondrian), etc. Similarly, the TPC-DI benchmark could be implemented on SQL Server Integration Services, Pentaho Data Integration,​ Talend Data Studio, SQL scripts, etc., which then load the data warehouse on a DBMS such as SQL Server, Oracle, PostgreSQL, etc. 
  
-  * Part I: Implement ​the TPC-DS benchmark ​(deadline 1/​11/​2018) +Furthermore,​ both benchmarks must be implemented with several scale factors, which determine the size of the resulting data warehouse. You DO NOT need to use the scale factors mentioned in the TPC requirements. The pedagogical objectives aimed at is that you learn how to properly perform a benchmark. Therefore, you need to estimate the biggest scale factor that you can put on your own computerthis will be your reference scale factor, say 1.0, and then you will need to have 3 smaller scale factors, e.g., at 0.1, 0.2, and 0.5 of the full size in order to see the evolution of the performance.
-  * Part IIImplement ​the TPC-DI benchmark (deadline 20/12/2018)+
  
-The submission deadlines for parts I and II are strict.+The project is carried out in groups of 3-4 persons, which will be the same for the two parts. Before you can submit part I of the project, you will have to register in a group. For this, please send an email to the lecturer with the information about your group by 1/10/2022 at the latest. ​The submission deadlines for parts I and II are strict.
  
-The assignment evaluation will count for 30% of your total grade. This may seem undervalued,​ however, putting effort in the assignment will definitely help you in achieving a better understanding of the course material which will result in a better score in the paper exam which amounts for 70% of the grade.+The deliverables expected ​for each part of the project are the following:​ 
 +  * A report ​in pdf explaining ​the essential aspects ​of your implementation,​ and 
 +  * A zip file containing ​the code of your implementation,​ with all necessary instructions to be able to replicate your implementation by the lecturer in standard computing infrastructure.
  
 +The project evaluation will count for 30% of your total grade. This may seem undervalued,​ however, putting effort in the project will definitely help you in achieving a better understanding of the course material which will result in a better score in the paper exam which amounts for 70% of the grade.
 +
 +===== Tools of the previous year =====
 +
 +SQL Server, PostgreSQL, mySQL, Oracle, SQLite, mariadb, Spark SQL, DB2/​Airflow,​ Microsoft Azure SQL, Citus, AWS Aurora, Google BigQuery, Impala
 +
 +===== Groups of the current year =====
 +
 +TBD
 ===== Examinations from Previous Years ===== ===== Examinations from Previous Years =====
  
 
teaching/infoh419.txt · Last modified: 2023/11/20 16:18 by ezimanyi