Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
teaching:projh402 [2018/10/02 08:13]
svsummer
teaching:projh402 [2020/09/30 20:47]
mahmsakr [Course objective]
Line 5: Line 5:
 The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://​iridia.ulb.ac.be/​proj-h-402/​index.php/​Main_Page for the rules concerning the project. ​ What follows is a list of project proposals supervised by academic members of CoDE. The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://​iridia.ulb.ac.be/​proj-h-402/​index.php/​Main_Page for the rules concerning the project. ​ What follows is a list of project proposals supervised by academic members of CoDE.
  
-===== Project proposals ​=====+===== Projects in Mobility Databases ​=====
  
-=== Engineering of Rule-Based Information Extraction Engine ===+Moving object databases (MOD) are database systems that can store and manage moving object data. A moving object is value that changes over time. It can be spatial (e.g., a car driving on the road network), or non-spatial (e.g., the temperature in Brussels). Using a variety of sensors, the changing values of moving objects can be recorded in digital formats. A MOD, then, helps storing and querying such data. A couple of prototypes have also been proposed, some of which are still active in terms of new releases. Yet, a mainstream system is by far still missing. Existing prototypes are merely research. By mainstream we mean that the development builds on widely accepted tools, that are actively being maintained and developed. A mainstream system would exploit the functionality of these tools, and would maximize the reuse of their ecosystems. As a result, it becomes more closer to end users, and easily adopted in the industry.
  
-Information extractionthe activity of extracting structured +In our groupwe are building MobilityDBa mainstream MOD. It builds on PostGIS, which is a spatial database extension of PostgreSQLMobilityDB extends the type system of PostgreSQL and PostGIS with ADTs for representing moving object dataIt definesfor instance, the tfloat for representing ​time dependant float, and the tgeompoint ​for representing a time dependant geometry pointMobilityDB types are well integrated into the platform, ​to achieve maximal reusability,​ hence a mainstream development. For instance, ​the tfloat builds on the PostgreSQL double precision type, and the tgeompoint build on the PostGIS geometry(pointtypeSimilarly MobilityDB builds on existing operationsindexing, and optimization framework.
-information from unstructured text, is a core data preparation +
-stepSystems ​for information extraction fall into two main +
-categoriesThe first category contains machine-learning based +
-systemswhere significant amount of training is required to train +
-good models ​for specific extraction tasksThe second category +
-consists of rule-based systems in which the data to be extracted from +
-the text is specified by (human-written) rules in some (often +
-declarativeextraction languageDespite advances in machine +
-learningrule-based systems are widely used in practice.+
  
-In recent years, novel theoretical algorithms have been proposed to +This is all made accessible via the SQL query interfaceCurrently MobilityDB ​is quite rich in terms of types and functions. It can answer sophisticated queries in SQL. The first beta version has been released as open source April 2019 (https://​github.com/​ULB-CoDE-WIT/​MobilityDB).
-more efficiently execute rule-based information extraction +
-workloadsThe objective in this project ​is to implement one such +
-Algorithm, by Florenzano et al (2018), experimentally analyze its +
-performance,​ and propose extensions of the algorithm to overcome +
-performance bottlenecks+
  
 +The following thesis ideas contribute to different parts of MobilityDB. They all constitute innovative development,​ mixing both research and development. They hence will help developing the student skills in:
  
-References: ​+    Understanding the theory and the implementation of moving object databases. 
 +    Understanding the architecture of extensible databases, in this case PostgreSQL. 
 +    Writing open source software.
  
-- Fernando Florenzano, Cristian Riveros, Martín Ugarte, 
-Stijn Vansummeren,​ Domagoj Vrgoc: Constant Delay Algorithms for 
-Regular Document Spanners. PODS 2018: 165-177 
  
- +===== Project proposals =====
-**Interested?​** Contact Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) +
- +
-**Status**: available +
- +
- +
-=== Query processing for mixed database-machine learning based workloads ​=== +
- +
-Because of the growing importance and wide deployment of large-scale +
-Machine Learning (ML), there is wide interest in the design and +
-implementation of processing engines that can efficiently evaluate ML +
-workloads. One class of sytems, embodied by systems such as Tensorflow +
-and SystemML takes linear algebra as the key primitive for expressing +
-ML workflows, and obtain efficient processing engines by porting known +
-database-style optimization techniques to the linear algebra +
-setting. Another class of systems, embodied by FAQ queries take +
-relational algebra as the key primitive, but modify it to allow +
-expression of certain ML workloads. To some extent, the classical +
-optimization techniques as well as recent results for exploiting +
-modern hardware transfer to this extended relational algebra. As an +
-added bonus, traditional database workloads (OLTP/OLAP style) can be +
-trivially supported +
- +
-The focus in this project is in the latter style of systems. The +
-overall goal is to experimentally identify classes of FAQ queries for +
-which it would be beneficial to exploit techniques developped in the +
-former class of systems. Concretely, this can be approached by +
-experimentally studying queries in the FAQ framework (featuring joins) +
-for which known results in evaluating linear algebra operations (in +
-concretum: matrix multiplication algorithms that run in less than +
-O(n^3) time) can be exploited. +
- +
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) +
- +
-**Status**: available+
  
  
 
teaching/projh402.txt · Last modified: 2022/09/06 10:39 by ezimanyi