INFO-H-415: Advanced Databases

Last important announcment

21/09/2021: Hello everyone,

The first exercise session will take place Thursday from 14:00-16:00 in the UB4.130 computer room. It is however strongly advised to bring your own computer and to install the required applications as indicated on the webpage of the exercise sessions (https://cs.ulb.ac.be/public/teaching/infoh415/tp, Before Session 1).

See you on Thursday,

Gilles

Lecturer

Teaching Assistant

Volume

  • Theory 24 h - Exercises 24h - Project 12h
  • 5 ECTS credits

Study Programme

  • Master in Computer Science and Engineering [MA-IRIF]
  • Master in Computer Sciences [INFO]
  • Erasmus Mundus Master in Big Data Management and Analytics (BDMA)

Schedule

The course is given during the first semester

  • Lectures on Mondays from 4 pm to 6 pm
  • Exercises on Thursdays from 2 pm to 4 pm

Objectives

Today, databases are moving away from typical management applications, and address new application areas. For this, databases must consider (1) recent developments in computer technology, as the object paradigm and distribution, and (2) management of new data types such as spatial or temporal data. This course introduces the concepts and techniques of some innovative database applications.

Content

Active Databases

Taxonomy of concepts. Applications of active databases: integrity maintenance, derived data, replication. Design of active databases: termination, confluence, determinism, modularisation.

Temporal Databases

Temporal data and applications. Time ontology. Conceptual modeling of temporal aspects. Manipulation of temporal data with standard SQL.

Graph Databases

Spatial Databases

Spatial data and applications. Space ontology. Conceptual modeling of spatial aspects. Manipulation of spatial data with standard SQL.

Reference Books

  • C. Zaniolo et al., Advanced Database Systems, Morgan Kaufmann, 1997
  • R.T. Snodgrass, Developing Time-Oriented Database Applications in SQL, Morgan Kaufmann, 2000 (version pdf)
  • Tom Johnston, Bitemporal Data: Theory and Practice, Morgan Kaufmann, 2014
  • R.T. Snodgrass, The TSQL2 Temporal Query Language, Kluwer Academic Publishers, 1995
  • S.W. Dietrich, S.D. Urban, Fundamentals of Object Databases: Object-Oriented and Object-Relational Design, Morgan & Claypool, 2011
  • Jim Melton and Alan R. Simon, SQL: 1999 - Understanding Relational Language Components, Morgan Kaufmann, 2001
  • Jim Melton, Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced Features, Morgan Kaufmann, 2002
  • Ian Robinson, Jim Webber, Emil Eifrem, Graph Databases, 2nd Edition, O'Reilly Media, 2015
  • Philippe Rigaux, Michel Scholl, Agnès Voisard, Spatial Databases: With Application to GIS, Morgan Kaufmann, 2001

Additional documentation

  • Norman W. Paton, Oscar Díaz, Active Database Systems, ACM Computer Surveys, 31(1): 63-103, 1999. (version pdf)
  • Jennifer Widom, The Starbust Active Database Rule System, IEEE Transactions on Knowledge and Data Engineering, 8(4): 583-595 1996 (version pdf)
  • E. Zimányi, Temporal Aggregates and Temporal Universal Quantifiers in Standard SQL, SIGMOD Record, 35(2):16-21, 2006. (version pdf)
  • Krishna Kulkarni, Jan-Eike Michels, Temporal features in SQL:2011, SIGMOD Record, 41(3):34-43, 2012. (version pdf)
  • Michael H. Böhlen, Anton Dignös, Johann Gamper, Christian S. Jensen, Temporal Data Management: An Overview, Proc. of the 7th European Summer School on Business Intelligence and Big Data, eBISS 2017, Bruxelles, Belgium, LNBIP 324, Springer 2018. (version pdf) * Gregory Sannik, Fred Daniels, Enabling the Temporal Data Warehouse, Teradata White paper. (version pdf)
  • Richard T. Snodgrass, A Case Study of Temporal Data, Teradata White paper. (version pdf)
  • Teradata, Temporal Table Support. (version pdf)
  • Teradata, ANSI Temporal Table Support. (version pdf)
  • IBM, A Matter of Time: Temporal Data Management in DB2 for z/OS. (version pdf)

Course Slides

Exercises

Project

Students, in groups of two or four students, will realize a project in a topic relevant to advanced databases. Examples of topics are given in the next section of this document. Please notice that the template for these topics is “<Technology> and <Tool>” for groups of 2 students and “<Technology> with <Tool1> and <Tool2>” for groups of 4 students.

Each group will study a database technology and illustrate it with an application developed ​​in a database management system to be chosen (e.g., SQL Server, PostgreSQL, MongoDB, etc.). The topic should be addressed in a technical way, to explain the foundations of the underlying technology. The application must use the chosen technology.

It is important to understand that the objective of the project is NOT about developing an application with GUI. The objective is to benchmark the proposed tool in relation to the database requirements of your application. Therefore, it is necessary to determine the set of queries and updates that your application requires and do a benchmark with, e.g., 1K, 10K, 100K, and 1M “objects” (rows, documents, nodes, etc. depending on the technology used) to determine if the tool shows a linear or exponential behavior. As usual when performing benchmarks, the queries and updates are executed n times (e.g., 6 times where the first execution is not considered because it is different from the others since the cache structures must be filled) and the average of the execution times is computed. A comparison with traditional relational technology must be provided to show that the chosen tool is THE technology of choice for your application, better than all other alternatives, and that it will perform correctly when the system is deployed at full scale.

The choice of topic and the application must be made ​​in agreement with the lecturer. The topic should not be included in the program of the Master in Computer Science and Engineering. The project will be presented to the lecturer and the fellow students at the end of the semester. This presentation will be supported by a slideshow. A written report containing the contents of the presentation is also required. The presentation and the report will (1) explain the foundations of the technology chosen, (2) explain how these foundations are implemented by the database management system chosen and (3) illustrate all these concepts with the application implemented.

For 2-student group, the duration of the presentation is 30 minutes. It will structured in two parts of similar length

  • An introduction to the technology
  • An introduction to the tool illustrated with an example application assessing its advantages and disadvantages.

For 4-student group, the duration of the presentation is 45 minutes. It will structured in three parts of similar length

  • An introduction to technologies presented jointly by the two groups
  • An introduction to the two tools, each presented by each group
  • A common assessment of the advantages and disadvantages of both tools tested in a common example application.

The evaluation of the project focuses on the following criteria:

  • Quality of the presentation,
  • Master of the topic presented, and
  • Quality of written report.

The project will count for 25% of the final grade.

The project must be submitted by Monday, December 13, 2021.

  • Analytical databases and Endeca
  • Cloud databases and Microsoft Azure
  • Column stores and Cassandra, Hbase, …
  • Data warehouses and Apache Hive
  • Deductive Databases and XSB
  • Distributed databases and SQL Server, DynamoDB, …
  • Document stores and Cloudant, Couchbase, CouchDB, MongoDB, RavenDB, RethinkDB, …
  • Embedded databases and BerkeleyDB
  • In-memory databases and Kdb+, MemSQL, Oracle TimesTen, Memcached, ….
  • Key-value stores and BerkeleyDB, DynamoDB, Redis, Voldermort, …
  • Multimedia databases and Oracle
  • Multi-model databases and MarkLogic
  • NewSQL databases and VoltDB
  • Object-oriented databases and ObjectBox, Perst
  • Real-time databases and Firebase
  • Search engines and Solr, ElasticSearch, Sphinx …
  • Spatial raster databases and Rasdaman
  • Stream databases and Apache Kafka, Event Stores
  • Time series databases and Influx DB, Kdb+, …
  • XML databases and BaseX

Topics for the current academic year

  • Cloud databases and Microsoft Azure SQL: Davide Rendina, Margarita Hernandez
  • Column stores and Cassandra: Md Jamiur Rahman Rifat, Khushnur Binte Jahangir
  • Datawarehouses and Apache Hive: Nicole Zafalón, Andrés Espinal
  • Distributed databases and SQL Server: Asha Seif, Kainaat Amjid
  • Distributed Databases with DynamoDB: Loïc Caudron, Matteo Snellings
  • Document stores with CouchBase and CouchDB: Mohammadreza Amini, Ossoama Benaissa, Zheng Ren, Adriana Sirbu
  • Document stores and Firestore: Luca De Santos, Sacha Keserovic
  • Document stores and MongoDB: Hang Yu, Zhiyang Guo
  • In-memory databases and Memcached: Diogo Repas and Sandra Hillergren
  • Key-value databases with Cloud bigtable and Redis: Luiz Fonseca, Zyrako Musaj, Yanjian Zhang and Zhicheng Luo
  • Multimedia databases and Oracle: Wassim Belgada, Imestir Ibrahim
  • NewSQL databases and VoltDB: Nabil Souissi
  • Object-oriented databases with ObjectBox and Perst: Filip Sotiroski, Niccolo Morabito, Andrea Gonzato, Pietro Ferrazi
  • Real-time databases and Firebase: Himanshu Choudhary, Sergio Postigo, Tejaswini dhuppad
  • Spatial raster databases and Rasdaman: Adam Broniewski, Victor Divi
  • Stream databases and Apache Kafka: Vlada Kylynnyk, Mahmut Asım Onat
  • Time series databases with Influx DB and Kdb+: Mohammad Zain Abbas, Muhammad Ismail, Yi Wu, Chonghan Li
  • Search engines with Apache Solr and ElasticSearch: Pap Sanou, Szymon Swirydowicz, Alexandre Chapelle, Nicolas Dardenne
  • XML Databases and BaseX: Maxime Renversez, Mael Touret

Examinations from Previous Years

 
teaching/infoh415.txt · Last modified: 2021/09/26 11:10 by ezimanyi