This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
teaching:projh402 [2018/10/02 08:13] svsummer |
teaching:projh402 [2020/09/30 21:04] mahmsakr [Projects in Mobility Databases] |
||
---|---|---|---|
Line 5: | Line 5: | ||
The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of CoDE. | The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of CoDE. | ||
- | ===== Project proposals ===== | + | ===== Projects in Mobility Databases ===== |
- | === Engineering of a Rule-Based Information Extraction Engine === | + | Mobility databases (MOD) are database systems that can store and manage moving object geospatial trajectory data. A moving object is an object that changes its location over time (e.g., a car driving on the road network). Using a variety of sensors, the location tracks of moving objects can be recorded in digital formats. A MOD, then, helps storing and querying such data. A couple of prototype systems have been proposed by research groups. Yet, a mainstream system is by far still missing. By mainstream we mean that the development builds on widely accepted tools, that are actively being maintained and developed. A mainstream system would exploit the functionality of these tools, and would maximize the reuse of their ecosystems. As a result, it becomes more closer to end users, and easily adopted in the industry. |
- | Information extraction, the activity of extracting structured | + | Towards filling this gap, our group is building the MobilityDB system [[https://github.com/MobilityDB/MobilityDB|https://github.com/MobilityDB/MobilityDB]]. It builds on PostGIS, which is a spatial database extension of PostgreSQL. MobilityDB extends the type system of PostgreSQL and PostGIS with ADTs for representing moving object data. It defines, for instance, the tgeompoint type for representing a time dependant geometry point. MobilityDB types are well integrated into the platform, to achieve maximal reusability, hence a mainstream development. For instance, the tgeompoint type builds on the PostGIS geometry(point) type. Similarly MobilityDB builds on existing operations, indexing, and optimization framework. |
- | information from unstructured text, is a core data preparation | + | |
- | step. Systems for information extraction fall into two main | + | |
- | categories. The first category contains machine-learning based | + | |
- | systems, where a significant amount of training is required to train | + | |
- | good models for specific extraction tasks. The second category | + | |
- | consists of rule-based systems in which the data to be extracted from | + | |
- | the text is specified by (human-written) rules in some (often | + | |
- | declarative) extraction language. Despite advances in machine | + | |
- | learning, rule-based systems are widely used in practice. | + | |
- | In recent years, novel theoretical algorithms have been proposed to | + | MobilityDB supports SQL as query interface. Currently it is quite rich in terms of types and functions. It is incubated as community project in OSGeo [[https://www.osgeo.org/projects/mobilitydb/|https://www.osgeo.org/projects/mobilitydb/]], which certifies high technical quality. |
- | more efficiently execute rule-based information extraction | + | |
- | workloads. The objective in this project is to implement one such | + | |
- | Algorithm, by Florenzano et al (2018), experimentally analyze its | + | |
- | performance, and propose extensions of the algorithm to overcome | + | |
- | performance bottlenecks. | + | |
+ | The following project ideas contribute to different parts of MobilityDB. They all constitute innovative development, mixing both research and development. They hence will help developing the student skills in: | ||
- | References: | + | * Understanding the theory and the implementation of moving object databases. |
+ | * Understanding the architecture of extensible databases, in this case PostgreSQL. | ||
+ | * Writing open source software. | ||
- | - Fernando Florenzano, Cristian Riveros, Martín Ugarte, | + | ===== Project proposals ===== |
- | Stijn Vansummeren, Domagoj Vrgoc: Constant Delay Algorithms for | + | |
- | Regular Document Spanners. PODS 2018: 165-177 | + | |
- | + | ||
- | + | ||
- | **Interested?** Contact Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
- | + | ||
- | **Status**: available | + | |
- | + | ||
- | + | ||
- | === Query processing for mixed database-machine learning based workloads === | + | |
- | + | ||
- | Because of the growing importance and wide deployment of large-scale | + | |
- | Machine Learning (ML), there is wide interest in the design and | + | |
- | implementation of processing engines that can efficiently evaluate ML | + | |
- | workloads. One class of sytems, embodied by systems such as Tensorflow | + | |
- | and SystemML takes linear algebra as the key primitive for expressing | + | |
- | ML workflows, and obtain efficient processing engines by porting known | + | |
- | database-style optimization techniques to the linear algebra | + | |
- | setting. Another class of systems, embodied by FAQ queries take | + | |
- | relational algebra as the key primitive, but modify it to allow | + | |
- | expression of certain ML workloads. To some extent, the classical | + | |
- | optimization techniques as well as recent results for exploiting | + | |
- | modern hardware transfer to this extended relational algebra. As an | + | |
- | added bonus, traditional database workloads (OLTP/OLAP style) can be | + | |
- | trivially supported | + | |
- | + | ||
- | The focus in this project is in the latter style of systems. The | + | |
- | overall goal is to experimentally identify classes of FAQ queries for | + | |
- | which it would be beneficial to exploit techniques developped in the | + | |
- | former class of systems. Concretely, this can be approached by | + | |
- | experimentally studying queries in the FAQ framework (featuring joins) | + | |
- | for which known results in evaluating linear algebra operations (in | + | |
- | concretum: matrix multiplication algorithms that run in less than | + | |
- | O(n^3) time) can be exploited. | + | |
- | + | ||
- | **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
- | + | ||
- | **Status**: available | + | |