Differences

This shows you the differences between two versions of the page.

Link to this comparison view

teaching:infoh419-2013 [2014/09/22 09:15] (current)
tcalders created
Line 1: Line 1:
 +====== INFO-H-419: Data Warehouses 2013 ======
 +[[DW Edition 2012|Edition 2012]]
  
 +
 +===== Lecturer =====
 +
 +  * [[http://​cs.ulb.ac.be/​members/​tcalders/​doku.php?​id=start|Toon Calders]]
 +  * <​toon.calders@ulb.ac.be>​
 +  * Room SU A 4.115
 +
 +===== Volume =====
 +
 +  * Theory 24 h - Exercises 24h - Project 12h
 +  * 5 ECTS
 +
 +===== Study Programme =====
 +
 +  * Master in Computer Science and Engineering [MA-IRIF]
 +  * Master in Computer Sciences [INFO]
 +  * Erasmus Mundus Master in Information Technologies for Business Intelligence (IT4BI)
 +
 +===== Grading =====
 +<note important>​[[https://​dl.dropboxusercontent.com/​u/​5119252/​DW/​2013/​solution-exam-DW-january-2014.pdf|Exam solution]] available.</​note>​
 +  * [[https://​dl.dropboxusercontent.com/​u/​5119252/​DW/​2013/​description.pdf|Group project]] (30%)
 +  * Written exam (70%)
 +    * the exam is open book; notes and books can be used. Laptops and other electronic devices are not allowed.
 +===== Course Summary =====
 +Relational and object-oriented databases are mainly suited for operational settings in which there are many small transactions querying and writing to the database. Consistency of the database (in the presence of potentially conflicting transactions) is of utmost importance. Much different is the situation in analytical processing where historical data is analyzed and aggregated in many different ways. Such queries differ significantly from the typical transactional queries in the relational model:
 +  * Typically analytical queries touch a larger part of the database and last longer than the transactional queries;
 +  * Analytical queries involve aggregations (min, max, avg, ...) over large subgroups of the data;
 +  * When analyzing data it is convenient to see it as multi-dimensional.
 +\\
 +For these reasons, data to be analyzed is typically collected into a data warehouse with Online Analytical Processing support. Online here refers to the fact that the answers to the queries should not take too long to be computed. Collecting the data is often referred to as Extract-Transform-Load (ELT). The data in the data warehouse needs to be organized in a way to enable the analytical queries to be executed efficiently. For the relational model star and snowflake schemes are popular designs. Next to OLAP on top of a relational database (ROLAP), also native OLAP solutions based on multidimensional structures (MOLAP) exist. In order to further improve query answering efficiency, some query results can already be materialized in the database, and new indexing techniques have been developped.
 +
 +In the course, the main concepts of multidimensional databases will be covered and illustrated using the SQL Server tools. Complimentary to the course, IBM and Teradata will give invited lectures.
 +
 +===== Books and other lecture material =====
 +  * [[http://​www.morganclaypool.com/​doi/​abs/​10.2200/​s00299ed1v01y201009dtm009|Multidimensional Databases and Data Warehousing]] by Cristian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. Morgan & Claypool Publishers.
 +  * [[http://​www.mcgraw-hill.co.uk/​html/​0071610391.html|Data Warehouse Design: Modern Principles and Methodologies]] by Golfarelli and Rizzi. McGraw-Hill,​ 2009
 +
 +==== Extra books ====
 +The following materials have been used to construct the course material, but are not required reading for the course:
 +  * Kimball, Ralph; Margy Ross, Warren Thornthwaite,​ Joy Mundy, Bob Becker (2008). The Data Warehouse Lifecycle Toolkit (2nd ed.). Wiley.
 +  * White and Research Papers ​
 +    * Two survey papers: [[https://​dl.dropbox.com/​u/​5119252/​DW/​papers/​chaudhuri.pdf|paper 1]],  [[https://​dl.dropbox.com/​u/​5119252/​DW/​papers/​olap.pdf|paper 2]]
 +  * Slides (via detailed schedule)
 +  * Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications,​ Elzbieta Malinowski, Esteban Zimányi, Springer, 2008
 +  * The Data Warehouse Toolkit, 2nd Ed., Kimball and Ross, Wiley, 2002
 +  * Building the Data Warehouse. 4th edition. Inmon, Wiley, 2005
 +  * Data Warehousing Fundamentals For IT Professionals. 2nd edition. Paulraj Ponniah, Wiley, 2010
 +
 +==== Prerequisites ====
 +  * Database System Concepts (Sixth Edition) by Silberschatz,​ Korth, and Sudarshan. McGraw-Hill (2011) [A copy of the book is available - ask the lecturer]
 +    * ER-modeling:​ Chapter 7
 +    * Keys and functional dependencies:​ Section 8.3.1
 +    * BCNF: 8.3.2
 +===== Software =====
 +  * For the exercises we use the SQLServer tools: MS SQLServer, SS Intergration Services, SS Analysis services, and SS Reporting Services\\
 +
 +=== Extra Resources ===
 +[[http://​www.teradatauniversitynetwork.com/​|Teradata University Network]] For the 2013-2014 academic year, the student password is '​UnifiedDataArchitecture'​
 + 
 +===== Detailed Schedule =====
 +**This schedule is as detailed as possible at this moment; it may be subject to change.**
 +
 +Lectures are on Wednesday:
 +  * Wed. 10-12 (Theory): ​
 +    * W1,​3,​4,​6-9,​14:​ SC.3.122
 +    * W2: **No lecture**
 +    * W5: SUB.5.132
 +    * W11,12: SK.3.201
 +  * Wed. 13-15 (Exercises):​
 +    * W1: SR42.4.110
 +    * W3-9, 12-14: <​del>​SUB4.329A</​del>​ **UB4.126** (computer lab)
 +    * W10: **No lecture (university closed)**
 +    * W11: J.1.104
 +
 +==Schedule:​==
 +<note important>​The schedule is always a mess; always watch our for changes ...</​note>​
 +
 +  * W1: 18/9 T: Introduction to the course and refresher relational databases ([[DW2013-W1|Details]])
 +  * W2: **No lecture**
 +  * W3: 2/10 T+E: Data cubes, SQL extensions ([[DW2013-W3|Details]])
 +  * W4: 9/10 T+E: Dimensional Modeling ([[DW2013-W4|Details]])
 +  * W5: 16/10 T+E: From conceptual to logical model ([[DW2013-W5|Details]])
 +  * W6: 23/10 T+E: Slowly Changing Dimensions + View materialization ([[DW2013-W6|Details]])
 +  * W7: 30/10 T+E: View materialization (continued) + Indexing for DW ([[DW2013-W7|Details]])
 +  * W8: 6/11 T+E: Indexing continued + ETL ([[DW2013-W8|Details]])
 +  * W9: 13/11 T+E: ETL continued ([[DW2013-W9|Details]]) ​
 +  * W10: 20/11: **No lecture: Festivités et cortège pour l'​anniversaire de la fondation de l'​Université**
 +  * W11: 27/11: ([[DW2013-W11|Details]])
 +    * T: Invited speaker about DataStage (10am); **in room SK3.201** ​
 +    * E: exercises **in room J.1.104**
 +  * W12: 4/12: ([[DW2013-W12|Details]])
 +    * 10AM T: Reporting & Data Mining **in room SK3.201** ​
 +    * 1PM E:  Reporting & Data Mining **back in room UB4.126**
 +  * W13:
 +    * <​del>​11/​12 10AM:</​del>​ **No lecture** (Overlap with BPM site visit to IBM)
 +    * 11/12 **1:30PM**: **S.P1.3.206;​ Invited speaker:** experiences of a DW consultant (IBM)
 +    * **Friday 13/12, 10h, S.C4.223A** **Invited speaker** on Teradata & AsterData ([[https://​dl.dropboxusercontent.com/​u/​5119252/​DW/​2013/​Teradata%20and%20Big%20Data.pdf|Slides Teradata]] - See 2nd slide for links and contact information of the speakers)
 +  * W14: 18/​12: ​
 +    * 18/12 10AM **room S.C3.122**: **Invited speaker** on the topic IBM Netezza ([[https://​dl.dropboxusercontent.com/​u/​5119252/​DW/​2013/​ULB_PureData_for_analytics_presentation_131218.pdf|Slides]])
 +    * 18/12 1PM **S.P1.3.206**:​ Closing session and Q&​A ​ ([[https://​dl.dropboxusercontent.com/​u/​5119252/​DW/​2013/​DW12-Conclusion.pdf|Slides concluding session]])
 
teaching/infoh419-2013.txt · Last modified: 2014/09/22 09:15 by tcalders