Differences

This shows you the differences between two versions of the page.

--- teaching:projh402 [2018/10/02 08:13]
svsummer
+++ teaching:projh402 [2020/09/30 21:04]
mahmsakr [Projects in Mobility Databases]
@@ Line 5: / Line 5: @@
 The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project.  What follows is a list of project proposals supervised by academic members of CoDE.
-===== Project proposals =====
+===== Projects in Mobility Databases =====
-=== Engineering of a Rule-Based Information Extraction Engine ===
+Mobility databases (MOD) are database systems that can store and manage moving object geospatial trajectory data. A moving object is an object that changes its location over time (e.g., a car driving on the road network). Using a variety of sensors, the location tracks of moving objects can be recorded in digital formats. A MOD, then, helps storing and querying such data. A couple of prototype systems have been proposed by research groups. Yet, a mainstream system is by far still missing. By mainstream we mean that the development builds on widely accepted tools, that are actively being maintained and developed. A mainstream system would exploit the functionality of these tools, and would maximize the reuse of their ecosystems. As a result, it becomes more closer to end users, and easily adopted in the industry.
-Information extraction, the activity of extracting structured
+Towards filling this gap, our group is building the MobilityDB system [[https://github.com/MobilityDB/MobilityDB|https://github.com/MobilityDB/MobilityDB]]. It builds on PostGIS, which is a spatial database extension of PostgreSQL. MobilityDB extends the type system of PostgreSQL and PostGIS with ADTs for representing moving object data. It defines, for instance, the tgeompoint type for representing a time dependant geometry point. MobilityDB types are well integrated into the platform, to achieve maximal reusability, hence a mainstream development. For instance, the tgeompoint type builds on the PostGIS geometry(point) type. Similarly MobilityDB builds on existing operations, indexing, and optimization framework.
-information from unstructured text, is a core data preparation
-step. Systems for information extraction fall into two main
-categories. The first category contains machine-learning based
-systems, where a significant amount of training is required to train
-good models for specific extraction tasks. The second category
-consists of rule-based systems in which the data to be extracted from
-the text is specified by (human-written) rules in some (often
-declarative) extraction language. Despite advances in machine
-learning, rule-based systems are widely used in practice.
-In recent years, novel theoretical algorithms have been proposed to
+MobilityDB supports SQL as query interface. Currently it is quite rich in terms of types and functions. It is incubated as community project in OSGeo [[https://www.osgeo.org/projects/mobilitydb/|https://www.osgeo.org/projects/mobilitydb/]], which certifies high technical quality.
-more efficiently execute rule-based information extraction
-workloads. The objective in this project is to implement one such
-Algorithm, by Florenzano et al (2018), experimentally analyze its
-performance, and propose extensions of the algorithm to overcome
-performance bottlenecks.
+The following project ideas contribute to different parts of MobilityDB. They all constitute innovative development, mixing both research and development. They hence will help developing the student skills in:
-References:
+  * Understanding the theory and the implementation of moving object databases.
+  * Understanding the architecture of extensible databases, in this case PostgreSQL.
+  * Writing open source software.
-- Fernando Florenzano, Cristian Riveros, Martín Ugarte,
+===== Project proposals =====
-Stijn Vansummeren, Domagoj Vrgoc: Constant Delay Algorithms for
-Regular Document Spanners. PODS 2018: 165-177
-**Interested?** Contact Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
-**Status**: available
-=== Query processing for mixed database-machine learning based workloads ===
-Because of the growing importance and wide deployment of large-scale
-Machine Learning (ML), there is wide interest in the design and
-implementation of processing engines that can efficiently evaluate ML
-workloads. One class of sytems, embodied by systems such as Tensorflow
-and SystemML takes linear algebra as the key primitive for expressing
-ML workflows, and obtain efficient processing engines by porting known
-database-style optimization techniques to the linear algebra
-setting. Another class of systems, embodied by FAQ queries take
-relational algebra as the key primitive, but modify it to allow
-expression of certain ML workloads. To some extent, the classical
-optimization techniques as well as recent results for exploiting
-modern hardware transfer to this extended relational algebra. As an
-added bonus, traditional database workloads (OLTP/OLAP style) can be
-trivially supported
-The focus in this project is in the latter style of systems. The
-overall goal is to experimentally identify classes of FAQ queries for
-which it would be beneficial to exploit techniques developped in the
-former class of systems. Concretely, this can be approached by
-experimentally studying queries in the FAQ framework (featuring joins)
-for which known results in evaluating linear algebra operations (in
-concretum: matrix multiplication algorithms that run in less than
-O(n^3) time) can be exploited.
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
-**Status**: available