This is an old revision of the document!
The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of CoDE.
With the gaining popularity of Big Data, many data processing engines are implemented in a managed virtual environment such as the Java Virtual Machine (e.g., Apache Hadoop, Apache Giraph, Drill, …). While this improves the portability of the engine, the tradeoffs and implementation principles w.r.t. traditional C++ implementations are sometimes less understood.
The objective in this project is to develop some basic functionalities of a database storage engine (Linked files, BTree, Extensible Hash table, basic external-memory sorting ) in a managed virtual machine (i.e., the Java Virtual Machine or and the .NET Common Language Runtime), and compare this with a C++-based implementation both on (1) ease of implementation and (2) execution efficiency. In order to develop the managed virtual machine implementation, the interested student will need to research the best practices that are used in the above-mentioned projects to gain maximum execution speed (e.g., use of the java.lang.unsafe feature, memory-mapped files, …).
Contact : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
Status: available
In 2005, researchers at the IBM Almaden Research Center developped a new system specifically geared for practical information extraction in the enterprise. This effort lead to SystemT , a rule-based IE system with an SQL-like declarative language named AQL (Annotation Query Language). The declarative nature of AQL enables new kinds of tools for extractor development, and a cost-based optimizer for performance.
The goal of this project is to develop an open-source compiler and runtime environment of (a simplified version of) AQL.
Contact : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
Status: available
Simulation and Bisimulation are fundamental notions in computer science. They underly many formal verification algorithms, and have recently been applied to the construction of so-called structural indexes,which are novel index data structures for relational databases and the Semantic Web. Essentially, a (bi)simulation is a relation on the nodes of a graph. Unfortunately, however, while efficient main-memory algorithms for computing whether two nodes are similar exist, these algorithms fail when no the input graphs are too large to fit in main memory.
The objective of this project is to implement a recently proposed algorithm for computing simulation in a distributed setting, and provide a preliminary performance evaluation of this implementation.
Contact : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
Status: available
In this project, the student is asked to construct a software system to help manage large collections of scientific papers in digital form. Specifically, the system must be able to:
Use of semantic web technologies (RDF, SPARQL, …) to store and search the metadata is encouraged.
Contact : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
Status: taken