This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
teaching:projh402 [2020/10/01 13:55] mahmsakr [Spatiotemporal k-Nearest Neighbour (kNN) Queries] |
teaching:projh402 [2020/10/03 18:20] ezimanyi [VODKA Indexes for MobilityDB] |
||
---|---|---|---|
Line 3: | Line 3: | ||
===== Course objective ===== | ===== Course objective ===== | ||
- | The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of CoDE. | + | The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of the WIT laboratory. |
===== Projects in Mobility Databases ===== | ===== Projects in Mobility Databases ===== | ||
Line 9: | Line 9: | ||
Mobility databases (MOD) are database systems that can store and manage moving object geospatial trajectory data. A moving object is an object that changes its location over time (e.g., a car driving on the road network). Using a variety of sensors, the location tracks of moving objects can be recorded in digital formats. A MOD, then, helps storing and querying such data. A couple of prototype systems have been proposed by research groups. Yet, a mainstream system is by far still missing. By mainstream we mean that the development builds on widely accepted tools, that are actively being maintained and developed. A mainstream system would exploit the functionality of these tools, and would maximize the reuse of their ecosystems. As a result, it becomes more closer to end users, and easily adopted in the industry. | Mobility databases (MOD) are database systems that can store and manage moving object geospatial trajectory data. A moving object is an object that changes its location over time (e.g., a car driving on the road network). Using a variety of sensors, the location tracks of moving objects can be recorded in digital formats. A MOD, then, helps storing and querying such data. A couple of prototype systems have been proposed by research groups. Yet, a mainstream system is by far still missing. By mainstream we mean that the development builds on widely accepted tools, that are actively being maintained and developed. A mainstream system would exploit the functionality of these tools, and would maximize the reuse of their ecosystems. As a result, it becomes more closer to end users, and easily adopted in the industry. | ||
- | Towards filling this gap, our group is building the [[https://github.com/MobilityDB/MobilityDB|MobilityDB]] system. It builds on [[https://postgis.net/|PostGIS]], which is a spatial database extension of [[https://www.postgresql.org/|PostgreSQL]]. MobilityDB extends the type system of PostgreSQL and PostGIS with ADTs for representing moving object data. It defines, for instance, the tgeompoint type for representing a time dependant geometry point. MobilityDB types are well integrated into the platform, to achieve maximal reusability, hence a mainstream development. For instance, the tgeompoint type builds on the PostGIS geometry(point) type. Similarly MobilityDB builds on existing operations, indexing, and optimization framework. | + | Towards filling this gap, our group is building the [[https://github.com/MobilityDB/MobilityDB|MobilityDB]] system. It builds on [[https://postgis.net/|PostGIS]], which is a spatial database extension of [[https://www.postgresql.org/|PostgreSQL]]. MobilityDB extends the type system of PostgreSQL and PostGIS with abstract data types (ADTs) for representing moving object data. It defines, for instance, the tgeompoint type for representing a time dependant geometry point. MobilityDB types are well integrated into the platform, to achieve maximal reusability, hence a mainstream development. For instance, the tgeompoint type builds on the PostGIS geometry(point) type. Similarly MobilityDB builds on existing operations, indexing, and optimization framework. |
MobilityDB supports SQL as query interface. Currently it is quite rich in terms of types and functions. It is incubated as community project in [[https://www.osgeo.org/projects/mobilitydb/|OSGeo]], which certifies high technical quality. | MobilityDB supports SQL as query interface. Currently it is quite rich in terms of types and functions. It is incubated as community project in [[https://www.osgeo.org/projects/mobilitydb/|OSGeo]], which certifies high technical quality. | ||
Line 20: | Line 20: | ||
- | ===== Visualization Moving Objects on the Web ===== | + | ===== Visualization of Moving Objects on the Web ===== |
- | <TBD> | + | There are several open source platforms for publishing spatial data and interactive mapping applications to the web. Two populars ones are [[https://mapserver.org/|MapServer]] and [[http://geoserver.org/|GeoServer]], which are written, respectively, in C and in Java. |
+ | Newer platforms exists, such as [[https://kepler.gl/|kepler.gl]], which were designed for handling large-scale data sets. | ||
+ | However, these platforms are used for static spatial data and are unable to cope with moving objects. The goal of the project is to extend one of these platforms with spatio-temporal data types in order to be able to display animated maps. | ||
===== Implementing TSBS on MobilityDB ===== | ===== Implementing TSBS on MobilityDB ===== | ||
Line 72: | Line 74: | ||
The goal of the project is to survey the state of art in continuous kNN queries, and to prototype selected methods in MobilityDB. Since it is a complex problem, these outcomes should serve as a base for a more elaborate thesis project. | The goal of the project is to survey the state of art in continuous kNN queries, and to prototype selected methods in MobilityDB. Since it is a complex problem, these outcomes should serve as a base for a more elaborate thesis project. | ||
+ | |||
+ | ===== K-D-Tree Indexes for MobilityDB ===== | ||
+ | |||
+ | Indexes are essential in databases for quickly locating data without having to search every row in a table every time a database table is accessed. Thus, an index is an auxiliary data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index. PostgreSQL provides [[https://habr.com/ru/company/postgrespro/blog/441962/|multiple types of indexes]] for various data types. | ||
+ | |||
+ | In MobilityDB two types of indexes has been implemented, namely, [[https://habr.com/en/company/postgrespro/blog/444742/|GiST]] and [[https://habr.com/ru/company/postgrespro/blog/446624/|SP-GiST]]. More precisely, in PostgreSQL, these types of indexes are frameworks for developing multiple types of indexes. Concerning SP-GiST indexes, in MobilityDB we have developed 4-dimensional quad-trees where the dimensions are X, Y, and possibly Z for the spatial dimension and T for the time dimension. An alternative approach would be to use [[https://en.wikipedia.org/wiki/K-d_tree|K-D Trees]]. K-D trees can be implemented in PostgreSQL using the SP-GiST framework and an example [[https://github.com/postgres/postgres/blob/master/src/backend/access/spgist/spgkdtreeproc.c|implementation]] for simple [[https://www.postgresql.org/docs/current/datatype-geometric.html|geometric types]] exist. The goal of the project is to implement K-D indexes for MobilityDB and perform a benchmark comparison between K-D trees and the existing 4-dimensional quad-trees. | ||
+ | |||
+ | ===== VODKA Indexes for MobilityDB ===== | ||
+ | |||
+ | MobilityDB provides [[https://habr.com/en/company/postgrespro/blog/444742/|GiST]] and [[https://habr.com/ru/company/postgrespro/blog/446624/|SP-GiST]] indexes for temporal types. These indexes are based on bounding boxes, that is, the nodes of the index tree store a bounding box that keeps the mininum and maximum values of each of the dimensions where X, Y, Z (if available) are for the spatial dimension and T for the temporal dimension. The reason for this is that a temporal type (for example, a moving point representing the movement of a vehicle) can have thousands of timestamped points and keeping all these points for each vehicle indexed in a table is very inefficient. By keeping the bounding box only it is possible to quickly filter the rows in a table and then a more detailed analysis can be made for those rows selected by the index. | ||
+ | |||
+ | However, the drawback of keeping a single bounding box for the whole trajectory makes that the index is not very selective as shown in the following figure (extracted from a presentation by Oleg Bartunov from PostgresPro) | ||
+ | |||
+ | {{:teaching:gist.png?400|}} | ||
+ | |||
+ | The goal of the project is to define [[https://www.pgcon.org/2014/schedule/events/696.en.html|VODKA indexes]] for MobilityDB, which enable us to store in the index multiple bounding boxes (one per segment) associated to each row in the table as shown in the following figure | ||
+ | |||
+ | {{:teaching:vodka.png?400|}} | ||
+ | |||
+ | |||