Differences

This shows you the differences between two versions of the page.

--- teaching:infoh419 [2019/10/22 13:56]
ezimanyi [Group Project]
+++ teaching:infoh419 [2024/10/22 13:04] (current)
ezimanyi [Examinations from Previous Years]
@@ Line 20: / Line 20: @@
 The course is given during the first semester
-  * Lectures on Tuesdays from 2 pm to 4 pm at the room S.UA4.218
+  * Lectures on Mondays from 10 am to 12 pm at the room S.C.3.122
-  * Exercises on Fridays from 2 pm to 4 pm at the room S.UB4.130
+  * Exercises on Tuesdays from 2 pm to 4 pm at the room S.UB4.136
 ===== Grading =====
   * Group project (30%)
   * Written exam (70%)
-    * the exam is open book; notes and books can be used. Laptops and other electronic devices are not allowed.
+    * the exam is open book; notes and books can be used. Laptops and other electronic devices are **not** allowed. Please prepare your paper material in advance, not the day before the examination to avoid any printing problems.
 ===== Course Summary =====
 Relational and object-oriented databases are mainly suited for operational settings in which there are many small transactions querying and writing to the database. Consistency of the database (in the presence of potentially conflicting transactions) is of utmost importance. Much different is the situation in analytical processing where historical data is analyzed and aggregated in many different ways. Such queries differ significantly from the typical transactional queries in the relational model:
@@ Line 32: / Line 32: @@
   * Analytical queries involve aggregations (min, max, avg, ...) over large subgroups of the data;
   * When analyzing data it is convenient to see it as multi-dimensional.
-\\
 For these reasons, data to be analyzed is typically collected into a data warehouse with Online Analytical Processing support. Online here refers to the fact that the answers to the queries should not take too long to be computed. Collecting the data is often referred to as Extract-Transform-Load (ELT). The data in the data warehouse needs to be organized in a way to enable the analytical queries to be executed efficiently. For the relational model star and snowflake schemes are popular designs. Next to OLAP on top of a relational database (ROLAP), also native OLAP solutions based on multidimensional structures (MOLAP) exist. In order to further improve query answering efficiency, some query results can already be materialized in the database, and new indexing techniques have been developped.
-In the course, the main concepts of multidimensional databases will be covered and illustrated using the SQL Server tools. Complimentary to the course, IBM and Teradata will give invited lectures.
+In the course, the main concepts of multidimensional databases will be covered and illustrated using the SQL Server tools.
 ===== Books =====
-  * [[https://www.springer.com/9783642546549|Data Warehouse Systems: Design and Implementation]] by Alejandro A. Vaisman and Esteban Zimányi. Springer, 2014.
+  * [[https://link.springer.com/978-3-662-65167-4|Data Warehouse Systems: Design and Implementation]], second edition, Alejandro A. Vaisman and Esteban Zimányi. Springer, 2022.
   * [[http://www.morganclaypool.com/doi/abs/10.2200/s00299ed1v01y201009dtm009|Multidimensional Databases and Data Warehousing]] by Cristian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. Morgan & Claypool Publishers.
   * [[http://www.mcgraw-hill.co.uk/html/0071610391.html|Data Warehouse Design: Modern Principles and Methodologies]] by Matteo Golfarelli and Stefano Rizzi. McGraw-Hill, 2009
@@ Line 78: / Line 78: @@
 ===== Software =====
-All software used in the course is available in the computer labs. Students who wish a personal copy of the software on their own computers, can get free copies of the software. Succinct instructions to acquire the software have been included below; in case additional help is required you can contact the sysadmin of the department: Arthur Lesuisse <alesuiss@ulb.ac.be>
+All software used in the course is available in the computer labs. Students who wish a personal copy of the software on their own computers, can get free copies of the software. Succinct instructions to acquire the software have been included below; in case additional help is required you can contact the sysadmin of the department: Robin Choquet <Robin.Choquet@ulb.be>
   * MS SQL Server Tools: can be downloaded for free from http://www.academicshop.be/msdnaa/ Register on this page with your ULB email address, and 'order' the free msdnaa. After verification you receive login credentials to download quite a few software packages for free. Select the SQL Server 2014 Enterprise edition.
-  * Indyco Builder can be downloaded from http://www.indyco.com/ . License keys for all students will be added soon.
@@ Line 95: / Line 94: @@
 The project of the course consist of 2 parts:
-  * Part I: Implement the TPC-DS benchmark (deadline 1/11/2019)
+  * Part I: Implement the TPC-DS benchmark (deadline 1/11/2024)
-  * Part II: Implement the TPC-DI benchmark (deadline 20/12/2019)
+  * Part II: Implement the TPC-DI benchmark (deadline 24/12/2024)
 You have free choice to use the tools on which the two benchmarks will be implemented. For example, the TPC-DS benchmark could be implemented on SQL Server Analysis Services, Pentaho Analysis Services (aka Mondrian), etc. Similarly, the TPC-DI benchmark could be implemented on SQL Server Integration Services, Pentaho Data Integration, Talend Data Studio, SQL scripts, etc., which then load the data warehouse on a DBMS such as SQL Server, Oracle, PostgreSQL, etc.
-Furthermore, both benchmarks can be implemented with several scale factors, which determine the size of the resulting data warehouse. For the purposes of this project you can use the smallest scale factor.
+Furthermore, both benchmarks must be implemented with several scale factors, which determine the size of the resulting data warehouse. You DO NOT need to use the scale factors mentioned in the TPC requirements. The pedagogical objectives aimed at is that you learn how to properly perform a benchmark. Therefore, you need to estimate the biggest scale factor that you can put on your own computer: this will be your reference scale factor, say 1.0, and then you will need to have 3 smaller scale factors, e.g., at 0.1, 0.2, and 0.5 of the full size in order to see the evolution of the performance.
-The project is carried out in groups of 3 to 4 persons, which will be the same for the two parts. Before you can submit part I of the project, you will have to register in a group. For this, please send an email to the lecturer with the information about your group by 1/10/2018 at the latest. The submission deadlines for parts I and II are strict.
+The project is carried out in groups of 3-4 persons, which will be the same for the two parts. Before you can submit part I of the project, you will have to register in a group. For this, please send an email to the lecturer with the information about your group by 1/10/2024 at the latest. The submission deadlines for parts I and II are strict.
 The deliverables expected for each part of the project are the following:
@@ Line 108: / Line 107: @@
 The project evaluation will count for 30% of your total grade. This may seem undervalued, however, putting effort in the project will definitely help you in achieving a better understanding of the course material which will result in a better score in the paper exam which amounts for 70% of the grade.
+===== Tools of the previous year =====
+SQL Server, PostgreSQL, mySQL, Oracle, SQLite, mariadb, Spark SQL, DB2/Airflow, Microsoft Azure SQL, Citus, AWS Aurora, Google BigQuery, Impala
 ===== Groups of the current year =====
-  * SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), SQL Server: Hung Nguyen, Valdemar Hernández Siles, Julio Candela Caceres, Ariston Harianto Lim
+  * Citus: Sara Saad, Marwah Sulaiman, Nishant Sushmakar, Olha Baliasina
-  * Pentaho Analysis Services, Pentaho Data Integration, PostgreSQL: Dimitrios Tsesmelis, Andrea Armani, Hridaya Subedi, Uchechukwu Fortune Njoku
+  * DuckDB: Viet Phuong Hoang, Nhu Ngoc Hoang, Ngoc Hoa Pham
-  * Apache Kylin, Talend Open Studio, SQL Server: Ricardo Holthausen, Alp Albay, Jesus Huete
+  * Firebird: Nima Kamali Lassem, Joel Anil Jose, Jule Grigat, Charlotte Garcia
-  * MySQL, Apache Airflow, cube.js: Ali Arous, Fabrício Ferreira, Ishaan Rachit Dwivedi, Ledia Isaj
+  * MariaDB: HanLing Hu, LianJie Li, Kaiwen Yuan
-  * Big Query, Cloud Data Fusion: Nithish Sankaranarayanan, Gayane Vardanyan, Yu-Hsuan Chen, Anant Gupta
+  * MySQL: Oluwanifemi Favour Olajuyigbe, Hadiqa Alamdar Bukhari, Mathilde Lourenco, Otto Wantland
-  * Microsoft Azure and <TBD>: Rodaina Mohamed, Karim Maatouk, Yi Chiau Li, Haftamu Hailu Tefera
+  * Oracle: Lucía Fernández, Stephanie Gomes, Filipe Russo, Elnara Yerbolatova
-  * Apache Hive and <TBD>: Yalei Li, Haonan Jin, Akash Malhotra
+  * PostgreSQL: Josu Bernal, Kristóf Balázs, Alfio Cardillo, Stefanos Kypritidis
-  * <TBD> and Jaspersoft: Haroon Rashid, Mahmudul Hasan, Emir Nurmatbekov
+  * SparkSQL: Jorge Ignacio del Río, Sebastian Alberto Neri Pérez, Nicola Mambelli, Adrian Patricio
-  * Spark SQL and <TBD>: Iva Mihajlovska, Tamara Bojanic, Đorđije Krivokapic, Iryna Nazarchuk
+  * SQLite: Youssef Talhaoui, Allan Noubissi Kamgang, Rodney Mangunza Muamba
-  * Oracle and <TBD>: Samia Azzouzi, Piotr Rochala, Paul Moua
+  * SQL Server: Yassin Talhaoui, Aristide Coquereau, Leila Bourouf, Mohamed Bouchkhachakh
+/*
+  * SQL Server: Enxhi Nushi, Gabriel Octavio Lozano Pinzón, Gian Carlo Tejada Gargate, José Carlos Lozano Dibildox
+  * PostgreSQL: Dionisius Mayr, Jakub Kwiatkowski, Gabriela Kaczmarek, Arijit
+  * Apache Hive: Yutao Chen, Qianyun Zhuang, Min Zhang, Ziyong Zhang
+  * Spark SQL: Valerio Rocca, Alexandre Dubois, Arnaud Cools, Maria Camila Salazar
+  * MySQL: Aryan Gupta, Dilbar Isakova, Hareem Raza, Muhammad Qasim Khan
+  * DuckDB: Jintao Ma, Linhan Wang, Iyoha Peace Osamuyi, Hieu Nguyen
+  * Oracle and Pentaho Data Integration: Sony Shrestha, Aayush Paudel, MD Kamrul Islam, Shofiyyah Nadhiroh
+  * Amazon Redshift: Rana İşlek, Simon Coessens, Berat Furkan Koçak, David García Morillo
+  * MariaDB: Izmar Soumaya, Ayadi Mustapha, Nils van Es Ostos,  Narmina Mahmudova
+  * SQLite: Benjamin Gold, François Diximier, Noah Laravine, Louai Bouzaher
+  * DB2: Nicolas Lermusiaux, Gaetan Poupart-Lafarge, Ozan Basaran, Onur Bacaksiz
+*/
+/*
+  * Spark SQL: Luis Alfredo Leon, Satria Bagus Wicaksono, Jezuela Gega, Isabella Forero
+  * MySQL:  Ali AbuSaleh, Liliia Aliakberova, Muhammad Rizwan Khalid, Mariana Mayorga Llano
+  * PostgreSQL: Mir Wise Khan, Rishika Gupta, Ahmad, Chidiebere Ogbuchi
+  * Oracle: Sayyor Yusupov, Nikola Ivanović, Bogdana Živković, Jose Antonio Lorencio Abril
+  * MariaDB: Prashant Gupta, Abd Alrhman Abu Sbeit, Maren, TBD.
+  * Citus: Manar El Amrani, Maxime Renversez, Alexandre Chapelle, Nicolas Dardenne
+  * Google BigQuery: Koumudi Ganepola, Adina Bondoc, Zyad Alazazi, Alaa Almutawa
+  * SQL Server: Arina Gepalova, Tianheng Zhou, You Xu, Marie Giot
+  * Microsoft Azure SQL: Evguéniy Starygin, Gauthier Roger France, Mathieu Pardon, Diego Rubas
+*/
 ===== Examinations from Previous Years =====
+  * Academic year 2023-2024
+    * {{:teaching:infoh419-2324-january.pdf|First session}}
   * Academic year 2016-2017
     * {{:teaching:infoh419:dw-exam-2017-january-solution.pdf|First session}}