INFO-H-415: Advanced Databases
Lecturer
Teaching Assistant
Volume
Study Programme
Master in Computer Science and Engineering [MA-IRIF]
Master in Computer Sciences [INFO]
Erasmus Mundus Master in Big Data Management and Analytics (BDMA)
Schedule
The course is given during the first semester
Grading
Group project (25%)
Written exam (75%)
Objectives
Today, databases are moving away from typical management applications, and address new application areas. For this, databases must consider (1) recent developments in computer technology, as the object paradigm and distribution, and (2) management of new data types such as spatial or temporal data. This course introduces the concepts and techniques of some innovative database applications.
Content
Spatial Databases
Spatial data and applications. Space ontology. Conceptual modeling of spatial aspects. Manipulation of spatial data with standard SQL.
Mobility Databases
Temporal Databases
Temporal data and applications. Time ontology. Conceptual modeling of temporal aspects. Manipulation of temporal data with standard SQL.
Active Databases
Taxonomy of concepts. Applications of active databases: integrity maintenance, derived data, replication. Design of active databases: termination, confluence, determinism, modularisation.
Reference Books
C. Zaniolo et al., Advanced Database Systems, Morgan Kaufmann, 1997
R.T. Snodgrass, Developing Time-Oriented Database Applications in SQL, Morgan Kaufmann, 2000 (
version pdf)
Tom Johnston, Bitemporal Data: Theory and Practice, Morgan Kaufmann, 2014
R.T. Snodgrass, The TSQL2 Temporal Query Language, Kluwer Academic Publishers, 1995
Jim Melton and Alan R. Simon, SQL: 1999 - Understanding Relational Language Components, Morgan Kaufmann, 2001
Jim Melton, Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced Features, Morgan Kaufmann, 2002
Philippe Rigaux, Michel Scholl, Agnès Voisard, Spatial Databases: With Application to GIS, Morgan Kaufmann, 2001
Additional documentation
Norman W. Paton, Oscar Díaz, Active Database Systems, ACM Computer Surveys, 31(1): 63-103, 1999. (
version pdf)
Jennifer Widom, The Starbust Active Database Rule System, IEEE Transactions on Knowledge and Data Engineering, 8(4): 583-595 1996 (
version pdf)
E. Zimányi, Temporal Aggregates and Temporal Universal Quantifiers in Standard SQL, SIGMOD Record, 35(2):16-21, 2006. (
version pdf)
Krishna Kulkarni, Jan-Eike Michels, Temporal features in SQL:2011, SIGMOD Record, 41(3):34-43, 2012. (
version pdf)
Michael H. Böhlen, Anton Dignös, Johann Gamper, Christian S. Jensen, Temporal Data Management: An Overview, Proc. of the 7th European Summer School on Business Intelligence and Big Data, eBISS 2017, Bruxelles, Belgium, LNBIP 324, Springer 2018. (
version pdf) * Gregory Sannik, Fred Daniels, Enabling the Temporal Data Warehouse, Teradata White paper. (
version pdf)
Richard T. Snodgrass, A Case Study of Temporal Data, Teradata White paper. (
version pdf)
-
-
IBM, A Matter of Time: Temporal Data Management in DB2 for z/
OS. (
version pdf)
Links
Course Slides
Exercises
Project
Students, in groups of four students, will realize a project in a topic relevant to advanced databases. Examples of topics are given in the next section of this document. Please notice that the template for these topics is “<Technology> with <Tool1> and <Tool2>”.
Each group will study a database technology (e.g., document stores, time series databases, etc.) and illustrate it with an application developed in two database management systems to be chosen (e.g., SQL Server, PostgreSQL, MongoDB, etc.). The topic should be addressed in a technical way, to explain the foundations of the underlying technology. The application must use the chosen technology. Examples of technologies and tools can be found for example in the following web site.
It is important to understand that the objective of the project is NOT about developing an application with a GUI. The objective is to benchmark the proposed tool in relation to the database requirements of your application. Therefore, it is necessary to determine the set of queries and updates that your application requires and do a benchmark with, e.g., 1K, 10K, 100K, and 1M “objects” (rows, documents, nodes, etc. depending on the technology used) to determine if the tool shows a linear or exponential behavior. Please notice that you SHOULD NOT generate data for the benchmark since you can find in Internet (1) a huge number of available datasets (2) alternatively, there are many available data generators.
As usual when performing benchmarks, the queries and updates are executed n times (e.g., 6 times where the first execution is not considered because it is different from the others since the cache structures must be filled) and the average of the execution times is computed. A comparison with traditional relational technology (e.g., using PostgreSQL) must be provided to show that the chosen tool is THE technology of choice for your application, better than all other alternatives, and that it will perform correctly when the system is deployed at full scale. Please notice that there are MANY standard benchmarks for various database technologies so in that case you should prefer using a standard benchmark that reinventing the wheel and create your own benchmark.
The choice of topic and the application must be made in agreement with the lecturer. The topic should not be included in the program of the Master in Computer Science and Engineering. The project will be presented to the lecturer and the fellow students at the end of the semester. This presentation will be supported by a slideshow. A written report containing the contents of the presentation is also required. The presentation and the report will (1) explain the foundations of the technology chosen, (2) explain how these foundations are implemented by the database management systems chosen and (3) illustrate all these concepts with the application implemented.
The duration of the presentation is 45 minutes. It will structured in three parts of SIMILAR length
An introduction to technology
An introduction to the two tools, each presented by a subgroup of two persons
A common assessment of the advantages and disadvantages of both tools tested in a common example application.
The evaluation of the project focuses on the following criteria:
Quality of the presentation,
Master of the topic presented, and
Quality of written report.
The project will count for 25% of the final grade.
The project must be submitted immediately after the project presentation, which will take place on the week on Monday December 16, 2024. Please send the report and the presentation in PDF format to the lecturer.
Cloud databases and Microsoft Azure, AWS, …
Column stores and Cassandra, Hbase, …
Data warehouses and Apache Hive
Distributed databases and SQL Server, Oracle, Citus, …
Document stores and Cloudant, Couchbase, CouchDB, MongoDB, RavenDB, RethinkDB, …
Embedded databases and BerkeleyDB
In-memory databases and Kdb+, MemSQL, Oracle TimesTen, Memcached, ….
Key-value stores and BerkeleyDB, DynamoDB, Redis, Voldermort, …
Multi-model databases and MarkLogic, CosmosDB
NewSQL databases and VoltDB, CockrachDB, …
Object-oriented databases and ObjectBox, Perst
Real-time databases and Firebase
Search engines and Solr, ElasticSearch, Sphinx …
Spatial raster databases and Rasdaman
Stream databases and Apache Kafka, Event Stores
Time series databases and Influx DB, Kdb+, …
XML databases and BaseX
Topics for the current academic year
Document stores with MongoDB and PostgreSQL: Oluwanifemi Favour Olajuyigbe, Hadiqa Alamdar Bukhari, Mathilde Lourenço, Hanling Hu
Document stores with Couchbase and CouchDB: Marwah Sulaiman, Sara Saad, Otto Wantland, Sebastian Neri
Document stores with Firebase Realtime Database and Google Cloud Firestore: Gloria Akli-Kodjo-Mensah, Joel Anil Jose, Amélie Liesenborghs and Salma Namouri
Graph databases with Neo4j and Apache AGE: Lucía Fernández, Stephanie Gomes, Filipe Russo, Elnara Yerbolatova
Graph databases with Aerospike and Virtuoso: Tim Ameryckx, Gilles Mevel, Antonios Sisiaridis, Sacha Delsaux
In-memory databases with Redis and Memcached: Hilal Rachik, Aya Iftissen, Antonio Baldari, Léoplod Guyot
Key-value stores with Redis and Amazon DynamoDB: Kerim Esiev, George Vasile, Rayane Bazi, Mohammed Sewif
Key value stores with Memcached and Aerospike: Anwar Boulahya, Antoine Frizot and Diogo Miguel Gonçalves Soares
Search engines with Sphinx and ElasticSearch: Alfio Cardillo, Charlotte García, Jule Grigat, Josu Bernal
Search engines with Microsoft Azure SQL Database and Amazon Cloud Search: Dylane Zouatom, Nicola Mambelli, Jorge Del Rio Sanchez and Souha Belhaj Rhouma
Search engines with Solr and PostgreSQL: Viet Phuong Hoang, Nhu Ngoc Hoang, Ngoc Hoa Pham, Maureen Barral
Stream databases with Apache Kafka and Amazon Kinesis: Moïra Vanderslagmolen, Arthur Inastallé, Jean-Nicolas Grégoire and Ze-Xuan Xu
Time series databases with TimeScaleDB and InfluxDB: Kristóf Balázs, Stefanos Kypritidis, Olha Baliasina, Nishant Sushmakar
Time series databases with Prometheus and Graphite: Derar Alnakeb, Hugo Colicchia, Mohamad Sy, Aristide Coquereau
Vector databases with PGVector and Chromadb: Nima Kamali Lassem, Adrian Patricio, Kaiwen Yuan, Lianjie Li
Examinations from Previous Years
Academic year 2023-204
Academic year 2016-2017
Academic year 2015-2016
Academic year 2014-2015
Academic year 2013-2014
Academic year 2012-2013
Academic year 2008-2009
Academic year 2007-2008
Academic year 2006-2007
Academic year 2002-2003
Academic year 2001-2002
Academic year 2000-2001