Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
teaching:infoh419 [2018/09/01 10:40]
ezimanyi [Lecturer]
teaching:infoh419 [2021/10/17 13:34]
ezimanyi [Group Project]
Line 6: Line 6:
   * [[http://​cs.ulb.ac.be/​members/​esteban/​|Esteban Zimányi]]   * [[http://​cs.ulb.ac.be/​members/​esteban/​|Esteban Zimányi]]
   * <​ezimanyi@ulb.ac.be>​   * <​ezimanyi@ulb.ac.be>​
-  * Tuesday 2 pm - 4 pm 
-  * Friday 4 pm - 6 pm 
- 
 ===== Volume ===== ===== Volume =====
  
Line 19: Line 16:
   * Master in Computer Sciences [INFO]   * Master in Computer Sciences [INFO]
   * Erasmus Mundus Master in Big Data Management and Analytics (BDMA)   * Erasmus Mundus Master in Big Data Management and Analytics (BDMA)
 +
 +===== Schedule =====
 +
 +The course is given during the first semester ​
 +  * Lectures on Tuesdays from 2 pm to 4 pm at the room S.UA4.218
 +  * Exercises on Fridays from 2 pm to 4 pm at the room S.UB4.130
  
 ===== Grading ===== ===== Grading =====
Line 59: Line 62:
   * {{teaching:​infoh419:​dw00-refresher.pdf|Refresher Databases}}   * {{teaching:​infoh419:​dw00-refresher.pdf|Refresher Databases}}
   * {{teaching:​infoh419:​dw01-introduction.pdf|Introduction}}   * {{teaching:​infoh419:​dw01-introduction.pdf|Introduction}}
-  ​* {{teaching:​infoh419:​dw02-cubes.pdf|Cubes}} +    ​* {{teaching:​infoh419:​database_explosion_report.pdf|Database explosion report}} 
-  * {{teaching:​infoh419:​dw03-dfm.pdf|Dimension Fact Model}} +    * {{teaching:​infoh419:​database_explosion.pdf|Database explosion}} 
-  * {{teaching:​infoh419:​dw04-logicalmodel.pdf|Logical Model}} +  * {{teaching:​infoh419:​dw02-dfm.pdf|Dimension Fact Model}} 
-  * {{teaching:​infoh419:​dw05-dimensionchanges.pdf|Dimension Changes}} +  * {{teaching:​infoh419:​dw03-logicalmodel.pdf|Logical Model}} 
-  * {{teaching:​infoh419:​dw06-etl.pdf|ETL}} +  * {{teaching:​infoh419:​dw04-dimensionchanges.pdf|Dimension Changes}} 
-  * {{teaching:​infoh419:​dw07-viewmaterialization.pdf|View Materialization}} +  * {{teaching:​infoh419:​dw05-etl.pdf|ETL}} 
-  * {{teaching:​infoh419:​dw08-indexing.pdf|Indexing}} +  * {{teaching:​infoh419:​dw06-viewmaterialization.pdf|View Materialization}} 
-  * {{teaching:​infoh419:​dw09-aggregatecomputation.pdf|Aggregate Computation}} +  * {{teaching:​infoh419:​dw07-indexing.pdf|Indexing}} 
-  * {{teaching:​infoh419:​dw10-conclusion.pdf|Conclusion}}+  * {{teaching:​infoh419:​dw08-aggregatecomputation.pdf|Aggregate Computation}} 
 +  * {{teaching:​infoh419:​dw09-conclusion.pdf|Conclusion}} ​
  
  
Line 84: Line 88:
   * [[teaching:​infoh419:​TP|Exercices Web page]]   * [[teaching:​infoh419:​TP|Exercices Web page]]
  
-===== Group assignment ​=====+===== Group Project ​===== 
 + 
 +[[http://​www.tpc.org|TPC]] is a non-profit corporation that defines transaction processing and database benchmarks and disseminates objective, verifiable TPC performance data to the industry. Regarding data warehouses, two TPC benchmarks are relevant: 
 +  * [[http://​www.tpc.org/​tpcds/​|TPC-DS]],​ the Decision Support Benchmark, which models the decision support functions of a retail product supplier.  
 +  * [[http://​www.tpc.org/​tpcdi/​|TPC-DI]],​ the Data Integration Support Benchmark, which models a typical ETL process that loads a data warehouse. 
 + 
 +The project of the course consist of 2 parts: 
 +  * Part I: Implement the TPC-DS benchmark (deadline 1/​11/​2021) 
 +  * Part II: Implement the TPC-DI benchmark (deadline 24/​12/​2021) 
 +You have free choice to use the tools on which the two benchmarks will be implemented. For example, the TPC-DS benchmark could be implemented on SQL Server Analysis Services, Pentaho Analysis Services (aka Mondrian), etc. Similarly, the TPC-DI benchmark could be implemented on SQL Server Integration Services, Pentaho Data Integration,​ Talend Data Studio, SQL scripts, etc., which then load the data warehouse on a DBMS such as SQL Server, Oracle, PostgreSQL, etc. 
  
-The assignment ​is carried out in groups of 3 to 4 peopleBefore ​you can submit assignment part I, you will have to register in a groupThe link to register a group is included belowPlease ​to select your group before or on 25/10/2018.+Furthermore,​ both benchmarks must be implemented with several scale factors, which determine the size of the resulting data warehouse. You DO NOT need to use the scale factors mentioned in the TPC requirements. ​The pedagogical objectives aimed is that you learn how to properly perform a benchmarkTherefore, you need to estimate the biggest scale factor that you can put on your own computer: this will be your reference scale factor (SF)say SF 1.0, and then you will need to have 3 smaller scale factors, e.g., at 0.1, 0.2, and 0.5 of the full size in order to see the evolution of the performance.
  
-The assignment consist ​of parts:+The project is carried out in groups ​of 3-4 persons, which will be the same for the two parts. Before you can submit part I of the project, you will have to register in a group. For this, please send an email to the lecturer with the information about your group by 1/10/2020 at the latest. The submission deadlines for parts I and II are strict.
  
-  * Part ICreate a conceptual model and translate to a logical schema ​ (deadline 15/11/2018) +The deliverables expected for each part of the project are the following
-  * Part II: (deadline 20/​12/​2018) +  * A report in pdf explaining ​the essential aspects of your implementationand 
-    * Creating ETL scripts for updating ​the database in SSIS+  A zip file containing ​the code of your implementationwith all necessary instructions to be able to replicate your implementation by the lecturer in standard computing infrastructure.
-    Predicting how the size of the data warehouse will grow over time, +
-    *  Deploy a data cube on top of the data warehouse and create a report.+
  
-Assignment part I will be available on 25/10For the next partsassignment II will become available right after the submission deadline ​of assignment part I. The submission deadlines ​for parts I and II are strict.+The project evaluation ​will count for 30% of your total gradeThis may seem undervaluedhowever, putting effort in the project will definitely help you in achieving a better understanding ​of the course material which will result in a better score in the paper exam which amounts ​for 70% of the grade.
  
-The assignment evaluation will count for 30% of your total grade. This may seem undervalued,​ however, putting effort in the assignment will definitely help you in achieving a better understanding of the course material which will result in a better score in the paper exam which amounts for 70% of the grade.+===== Groups ​of the current year =====
  
 +  * SQL Server: Nicole Zafalón, Diogo Rapas, Andrés Espinal, Adam Broniewski
 +  * PostgreSQL: Niccolò Morabito, CHUN HAN LI, Víctor Diví, Filip Sotiroski
 +  * mySQL: Valada kylynnyk, Yanjian Zhang, Zhicheng Lou, Kainaat Amjid 
 +  * Oracle: El Achouchi Iliass, Belgada Wassim, Ajouaou Soufiane
 +  * SQLite: Laamiri Achraf, Mareghni Nidhal, Kuete Kamta Frank Jordan
 +  * mariadb: Tejaswini Dhupad, Himanshu Choudhary, Kamdem Tagne Thomas Borel, Sergio Postigo
 +  * Spark SQL: Yi Wu, Hang Yu, Zhiyang Guo, Mohammad Zain Abbas
 +  * DB2/​Airflow:​ Md Jamiur Rahman Rifat, Khushnur Binte Jahangir, Asha Said Seif, Pietro Ferrazzi
 +  * Microsoft Azure SQL: Davide Rendina, Marita Hernandez, Luiz Fonseca, Zyrako Musaj
 +  * ScylaDB: Nazgul K. Rakhimzhanova⁩,​ Mohammad Ismail Tirmizi, Maël Touret, Wassim Kezai
 +  * AWS Aurora: Hind Bakkali, Gaëlle Frauenkron, Mahmut Asım Onat, Salma Salmani
 +  * Google BigQuery: Soufian El Bakkali Tamara, Maciej Piekarski, David Silberwasser,​ Sami Abdul Sater
 +  * Impala: Yahya Bakkali, Amirmohammad Fallahi, Maxime Hauwaert, Alexandre Libert
 ===== Examinations from Previous Years ===== ===== Examinations from Previous Years =====
  
 
teaching/infoh419.txt · Last modified: 2023/11/20 16:18 by ezimanyi