Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
teaching:projh402 [2018/10/02 08:13]
svsummer
teaching:projh402 [2020/10/01 11:56]
mahmsakr [Map-matching as a Service]
Line 5: Line 5:
 The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://​iridia.ulb.ac.be/​proj-h-402/​index.php/​Main_Page for the rules concerning the project. ​ What follows is a list of project proposals supervised by academic members of CoDE. The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://​iridia.ulb.ac.be/​proj-h-402/​index.php/​Main_Page for the rules concerning the project. ​ What follows is a list of project proposals supervised by academic members of CoDE.
  
-===== Project proposals ​=====+===== Projects in Mobility Databases ​=====
  
-=== Engineering ​of a Rule-Based Information Extraction Engine ===+Mobility databases (MOD) are database systems that can store and manage moving object geospatial trajectory data. A moving object is an object that changes its location over time (e.g., a car driving on the road network). Using a variety of sensors, the location tracks of moving objects can be recorded in digital formats. A MOD, then, helps storing and querying such data. A couple of prototype systems have been proposed by research groups. Yet, a mainstream system is by far still missing. By mainstream we mean that the development builds on widely accepted tools, that are actively being maintained and developed. A mainstream system would exploit the functionality of these tools, and would maximize the reuse of their ecosystems. As result, it becomes more closer to end users, and easily adopted in the industry.
  
-Information extraction, the activity of extracting structured +Towards filling this gapour group is building ​the [[https://​github.com/​MobilityDB/​MobilityDB|MobilityDB]] system. It builds on [[https://​postgis.net/​|PostGIS]]which is a spatial database extension of [[https://​www.postgresql.org/​|PostgreSQL]]. MobilityDB extends the type system of PostgreSQL and PostGIS with ADTs for representing moving object dataIt defines, for instance, the tgeompoint type for representing a time dependant geometry pointMobilityDB types are well integrated into the platform, ​to achieve maximal reusability,​ hence a mainstream development. For instance, ​the tgeompoint type builds on the PostGIS geometry(pointtypeSimilarly MobilityDB builds on existing operationsindexing, and optimization framework.
-information from unstructured text, is a core data preparation +
-stepSystems ​for information extraction fall into two main +
-categoriesThe first category contains machine-learning based +
-systemswhere a significant amount of training is required to train +
-good models ​for specific extraction tasksThe second category +
-consists of rule-based systems in which the data to be extracted from +
-the text is specified by (human-written) rules in some (often +
-declarativeextraction languageDespite advances in machine +
-learningrule-based systems are widely used in practice.+
  
-In recent years, novel theoretical algorithms have been proposed to +MobilityDB supports SQL as query interfaceCurrently it is quite rich in terms of types and functions. It is incubated as community project in [[https://​www.osgeo.org/​projects/​mobilitydb/​|OSGeo]]which certifies high technical quality
-more efficiently execute rule-based information extraction +
-workloadsThe objective ​in this project ​is to implement one such +
-Algorithm, by Florenzano et al (2018)experimentally analyze its +
-performance,​ and propose extensions of the algorithm to overcome +
-performance bottlenecks+
  
 +The following project ideas contribute to different parts of MobilityDB. They all constitute innovative development,​ mixing both research and development. They hence will help developing the student skills in:
  
-References: ​+  * Understanding the theory and the implementation of moving object databases. 
 +  * Understanding the architecture of extensible databases, in this case PostgreSQL. 
 +  * Writing open source software.
  
-- Fernando Florenzano, Cristian Riveros, Martín Ugarte, 
-Stijn Vansummeren,​ Domagoj Vrgoc: Constant Delay Algorithms for 
-Regular Document Spanners. PODS 2018: 165-177 
  
 +===== Visualization Moving Objects on the Web =====
  
-**Interested?​** Contact Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)+<TBD>
  
-**Status**: available 
  
 +===== Implementing TSBS on MobilityDB =====
  
-=== Query processing for mixed database-machine learning based workloads ===+The Time Series Benchmark Suite ([[https://​github.com/​timescale/​tsbs|TSBS]]) is a collection of Go programs that are used to generate datasets and then benchmark read and write performance of various time series databases. This bechmark has been developed by [[https://​www.timescale.com/​|TimescaleDB]],​ which is a time series extension of PostgreSQL. ​
  
-Because ​of the growing importance and wide deployment of large-scale +A significant addition ​of TimescaleDB to PosgreSQL ​is the addition ​of the [[https://​blog.timescale.com/​blog/​simplified-time-series-analytics-using-the-time_bucket-function/​|time_bucket]] functionThis function allows ​to partition ​the time line in user-defined interval units that are used for aggregating data.
-Machine Learning (ML), there is wide interest in the design and +
-implementation ​of processing engines that can efficiently evaluate ML +
-workloads. One class of sytems, embodied by systems such as Tensorflow +
-and SystemML takes linear algebra as the key primitive for expressing +
-ML workflows, and obtain efficient processing engines by porting known +
-database-style optimization techniques to the linear algebra +
-settingAnother class of systems, embodied by FAQ queries take +
-relational algebra as the key primitive, but modify it to allow +
-expression of certain ML workloads. To some extent, ​the classical +
-optimization techniques as well as recent results ​for exploiting +
-modern hardware transfer to this extended relational algebraAs an +
-added bonus, traditional database workloads (OLTP/OLAP style) can be +
-trivially supported+
  
-The focus in this project ​is in the latter style of systems. The +The project ​consists ​in implementing a multidimensional generalization of the time_bucket function that allows the user to partition ​the spatial and/or temporal domain ​of a table in units (or tiles) that can be used for aggregating data. Then, the project consists of performing a benchmark comparison of TimescaleDB and MobilityDB.
-overall goal is to experimentally identify classes of FAQ queries for +
-which it would be beneficial to exploit techniques developped in the +
-former class of systems. Concretely, this can be approached by +
-experimentally studying queries ​in the FAQ framework ​(featuring joins) +
-for which known results in evaluating linear algebra operations (in +
-concretum: matrix multiplication algorithms ​that run in less than +
-O(n^3) time) can be exploited.+
  
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) 
  
-**Status**available+ 
 +===== Distributed Moving Object Database on Amazon AWS ===== 
 +A distributed database is an architecture in which multiple database instances on different machines are integrate in order to form a single database server. Both the data and the queries are then distributed over these database instances. This architecture is effective in deploying big databases on a cloud platform. 
 + 
 +MobilityDB is engineered as an extension of PostgreSQL. AWS supports PostgreSQL databases in Amazon RDS for PostgreSQL and in Amazon Aurora. The goal of this project is to integrate MobilityDB with these products. The key outcomes are a comprehensive assessment of which MOD API can/cannot be distributed,​ and an assessment of the performance gain. These outcomes should serve as a base for a thesis project to achieve effective integration. 
 + 
 + 
 +===== Distributed Moving Object Database on MS Azure ===== 
 +A distributed database is an architecture in which multiple database instances on different machines are integrate in order to form a single database server. Both the data and the queries are then distributed over these database instances. This architecture is effective in deploying big databases on a cloud platform. 
 + 
 +MobilityDB is engineered as an extension of PostgreSQL. MS Azure supports distributed PostgreSQL databases using [[https://​www.citusdata.com/​|Citus]]. We have made successful tests for integrating MobilityDB and Citus on a local cluster. The goal of this project is to repeat this work on MS Azureintegrate MobilityDB with these products. The key outcomes are a comprehensive assessment of which MOD API can/cannot be distributed,​ and an assessment of the performance gain. These outcomes should serve as a base for a thesis project to achieve effective integration. 
 + 
 +===== Map-matching as a Service ===== 
 +GPS location tracks typically contain errors, as the GPS points will normally be some meters away from the true position. If we know that the movement happened on a street network, e.g., a bus or a car, then we can correct this back by putting the points on the street. Luckily there are Algorithms for this, called Map-Matching. There are also a handful of open source systems that do map matching. It remains however difficult to end users to use them, because they involve non-trivial installation and configuration effort. Preparing the base map, which will be used in the matching is also an issue to users.  
 + 
 +The goal of this project is to build an architecture for a Map-Matching service. The challanges are that the GPS data arrives in different formats, and that Map-Matching is a time consuming Algorithm. This architecture should thus allow different input formats, and should be able to automatically scale according to the request rate. Another key outcome of this project is to compare the existing Map-Matching implementations,​ and to discuss their suitability in real world problems. 
 + 
 +Links: 
 +[[https://​github.com/​bmwcarit/​barefoot|Barefoot]] 
 +[[https://​valhalla.readthedocs.io/​en/​latest/​api/​map-matching/​api-reference/​|Valhalla Map Matching API]]  
 +[[https://​github.com/​graphhopper/​map-matching|GraphHopper]] 
 +[[https://​github.com/​cyang-kth/​fmm|Fast Map Matching]] 
 + 
 + 
 +===== Geospatial Trajectory Data Cleaning ===== 
 + 
 + 
 +===== Geospatial Trajectory Similarity Measure ===== 
 + 
 + 
 +===== Spatiotemporal k-Nearest Neighbour (kNN) Queries ===== 
  
  
 
teaching/projh402.txt · Last modified: 2022/09/06 10:39 by ezimanyi