This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
teaching:projh402 [2015/06/26 16:09] svsummer |
teaching:projh402 [2020/09/30 20:47] mahmsakr [Course objective] |
||
---|---|---|---|
Line 5: | Line 5: | ||
The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of CoDE. | The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of CoDE. | ||
- | ===== Project proposals ===== | + | ===== Projects in Mobility Databases ===== |
- | ===== Graph Indexing for Fast Subgraph Isomorphism Testing ===== | + | Moving object databases (MOD) are database systems that can store and manage moving object data. A moving object is a value that changes over time. It can be spatial (e.g., a car driving on the road network), or non-spatial (e.g., the temperature in Brussels). Using a variety of sensors, the changing values of moving objects can be recorded in digital formats. A MOD, then, helps storing and querying such data. A couple of prototypes have also been proposed, some of which are still active in terms of new releases. Yet, a mainstream system is by far still missing. Existing prototypes are merely research. By mainstream we mean that the development builds on widely accepted tools, that are actively being maintained and developed. A mainstream system would exploit the functionality of these tools, and would maximize the reuse of their ecosystems. As a result, it becomes more closer to end users, and easily adopted in the industry. |
- | There is an increasing amount of scientific data, mostly from the bio-medical sciences, that can be represented as collections of graphs (chemical molecules, gene interaction networks, ...). A crucial operation when searching in this data is that of subgraph isomorphism testing: given a pattern P that one is interested in (also a graph) in and a collection D of graphs (e.g., chemical molecules), find all graphs in G that have P as a subgraph. Unfortunately, the subgraph isomorphism problem is computationally intractable. In ongoing research, to enable tractable processing of this problem, we aim to reduce the number of candidate graphs in D to which a subgraph isomorphism test needs to be executed. Specifically, we index the graphs in the collection D by means of decomposing them into graphs for which subgraph isomorphism *is* tractable. An associated algorithm that filters graphs that certainly cannot match P can then formulated based on ideas from information retrieval. | + | In our group, we are building MobilityDB, a mainstream MOD. It builds on PostGIS, which is a spatial database extension of PostgreSQL. MobilityDB extends the type system of PostgreSQL and PostGIS with ADTs for representing moving object data. It defines, for instance, the tfloat for representing a time dependant float, and the tgeompoint for representing a time dependant geometry point. MobilityDB types are well integrated into the platform, to achieve maximal reusability, hence a mainstream development. For instance, the tfloat builds on the PostgreSQL double precision type, and the tgeompoint build on the PostGIS geometry(point) type. Similarly MobilityDB builds on existing operations, indexing, and optimization framework. |
- | In this project, the student will emperically validate on real-world datasets the extent to which graphs can be decomposed into graphs for which subgraph isomorphism is tractable, and run experiments to validate the effectiveness of the proposed method in terms of filtering power. | + | This is all made accessible via the SQL query interface. Currently MobilityDB is quite rich in terms of types and functions. It can answer sophisticated queries in SQL. The first beta version has been released as open source April 2019 (https://github.com/ULB-CoDE-WIT/MobilityDB). |
- | **Interested?** Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] | + | The following thesis ideas contribute to different parts of MobilityDB. They all constitute innovative development, mixing both research and development. They hence will help developing the student skills in: |
- | **Status**: available | + | Understanding the theory and the implementation of moving object databases. |
- | + | Understanding the architecture of extensible databases, in this case PostgreSQL. | |
- | ==== Principles of Database Management Architectures in Managed Virtual Environments ==== | + | Writing open source software. |
- | + | ||
- | With the gaining popularity of Big Data, many data processing engines | + | |
- | are implemented in a managed virtual environment such as the Java | + | |
- | Virtual Machine (e.g., Apache Hadoop, Apache Giraph, Drill, | + | |
- | ...). While this improves the portability of the engine, the tradeoffs | + | |
- | and implementation principles w.r.t. traditional C++ implementations | + | |
- | are sometimes less understood. | + | |
- | + | ||
- | The objective in this project is to develop some basic functionalities | + | |
- | of a database storage engine (Linked files, BTree, Extensible Hash | + | |
- | table, basic external-memory sorting ) in a managed virtual machine | + | |
- | (i.e., the Java Virtual Machine or and the .NET Common Language | + | |
- | Runtime), and compare this with a C++-based implementation both on (1) | + | |
- | ease of implementation and (2) execution efficiency. In order to | + | |
- | develop the managed virtual machine implementation, the interested | + | |
- | student will need to research the best practices that are used in the | + | |
- | above-mentioned projects to gain maximum execution speed (e.g., use of | + | |
- | the java.lang.unsafe feature, memory-mapped files, ...). | + | |
- | + | ||
- | **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
- | + | ||
- | **Status**: available | + | |
- | + | ||
- | + | ||
- | ==== Development of a compiler and runtime engine for AQL ==== | + | |
- | + | ||
- | In 2005, researchers at the IBM Almaden Research Center developped a | + | |
- | new system specifically geared for practical information extraction in | + | |
- | the enterprise. This effort lead to [[https://www.google.be/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CEYQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.179.356%26rep%3Drep1%26type%3Dpdf&ei=gyhIUe-XPIexPJ-fgLAG&usg=AFQjCNHgkbcREbd6bCA26BVf0FuIZ9n7Sg&sig2=LVQkus_67uSVlwK34BXZ8w&bvm=bv.43828540,d.ZWU|SystemT]] , a rule-based IE system with an SQL-like declarative language named [[http://pic.dhe.ibm.com/infocenter/bigins/v2r0/topic/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/aql_overview.html|AQL (Annotation Query Language)]]. | + | |
- | The declarative nature of AQL enables new kinds of tools for extractor | + | |
- | development, and a cost-based optimizer for | + | |
- | performance. | + | |
- | + | ||
- | The goal of this project is to develop an open-source compiler and | + | |
- | runtime environment of (a simplified version of) AQL. | + | |
- | + | ||
- | **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
- | + | ||
- | **Status**: available | + | |
- | + | ||
- | ==== Development of a distributed simulation algorithm ==== | + | |
- | + | ||
- | Simulation and Bisimulation are fundamental notions in computer | + | |
- | science. They underly many formal verification algorithms, and have | + | |
- | recently been applied to the construction of so-called structural | + | |
- | indexes,which are novel index data structures for relational databases | + | |
- | and the Semantic Web. Essentially, a (bi)simulation is a relation on | + | |
- | the nodes of a graph. Unfortunately, however, while efficient | + | |
- | main-memory algorithms for computing whether two nodes are similar | + | |
- | exist, these algorithms fail when no the input graphs are too large to | + | |
- | fit in main memory. | + | |
- | + | ||
- | The objective of this project is to implement a recently proposed | + | |
- | algorithm for computing simulation in a distributed setting, and | + | |
- | provide a preliminary performance evaluation of this implementation. | + | |
- | + | ||
- | **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
- | + | ||
- | **Status**: available | + | |
+ | ===== Project proposals ===== | ||