Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
teaching:projh402 [2014/09/22 14:07]
svsummer [Project proposals]
teaching:projh402 [2020/10/01 11:59]
mahmsakr [Distributed Moving Object Database on Amazon AWS]
Line 5: Line 5:
 The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://​iridia.ulb.ac.be/​proj-h-402/​index.php/​Main_Page for the rules concerning the project. ​ What follows is a list of project proposals supervised by academic members of CoDE. The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://​iridia.ulb.ac.be/​proj-h-402/​index.php/​Main_Page for the rules concerning the project. ​ What follows is a list of project proposals supervised by academic members of CoDE.
  
-===== Project proposals ​=====+===== Projects in Mobility Databases ​=====
  
-=== Principles ​of Database Management Architectures ​in Managed Virtual Environments ====+Mobility databases (MOD) are database systems that can store and manage moving object geospatial trajectory data. A moving object is an object that changes its location over time (e.g., a car driving on the road network). Using a variety ​of sensors, the location tracks of moving objects can be recorded in digital formats. A MOD, then, helps storing and querying such data. A couple of prototype systems have been proposed by research groups. Yet, a mainstream system is by far still missing. By mainstream we mean that the development builds on widely accepted tools, that are actively being maintained and developed. A mainstream system would exploit the functionality of these tools, and would maximize the reuse of their ecosystems. As a result, it becomes more closer to end users, and easily adopted ​in the industry.
  
-With the gaining popularity of Big Datamany data processing engines +Towards filling this gapour group is building ​the [[https://​github.com/​MobilityDB/​MobilityDB|MobilityDB]] systemIt builds on [[https://​postgis.net/​|PostGIS]]which is a spatial database extension of [[https://​www.postgresql.org/​|PostgreSQL]]MobilityDB extends ​the type system ​of PostgreSQL and PostGIS with ADTs for representing moving object data. It defines, for instance, the tgeompoint type for representing a time dependant geometry pointMobilityDB types are well integrated into the platform, to achieve maximal reusability,​ hence a mainstream developmentFor instance, the tgeompoint type builds on the PostGIS geometry(point) typeSimilarly MobilityDB builds on existing operations, indexing, and optimization framework.
-are implemented in a managed virtual environment such as the Java +
-Virtual Machine (e.g., Apache Hadoop, Apache Giraph, Drill, +
-...). While this improves ​the portability ​of the engine, the tradeoffs +
-and implementation principles w.r.ttraditional C++ implementations +
-are sometimes less understood.+
  
-The objective in this project ​is to develop some basic functionalities +MobilityDB supports SQL as query interface. Currently it is quite rich in terms of types and functionsIt is incubated as community project ​in [[https://​www.osgeo.org/​projects/​mobilitydb/​|OSGeo]]which certifies high technical quality
-of a database storage engine (Linked files, BTree, Extensible Hash +
-table, basic external-memory sorting ) in a managed virtual machine +
-(i.e., the Java Virtual Machine or and the .NET Common Language +
-Runtime), and compare this with a C++-based implementation both on (1) +
-ease of implementation ​and (2) execution efficiencyIn order to +
-develop the managed virtual machine implementation,​ the interested +
-student will need to research the best practices that are used in the +
-above-mentioned projects to gain maximum execution speed (e.g., use of +
-the java.lang.unsafe feature, memory-mapped files, ...).+
  
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)+The following project ideas contribute to different parts of MobilityDBThey all constitute innovative development,​ mixing both research and developmentThey hence will help developing the student skills in:
  
-**Status**: available+  ​Understanding the theory and the implementation of moving object databases. 
 +  ​Understanding the architecture of extensible databases, in this case PostgreSQL. 
 +  ​Writing open source software.
  
  
-=== Development of a compiler and runtime engine for AQL ====+===== Visualization Moving Objects on the Web =====
  
-In 2005, researchers at the IBM Almaden Research Center developped a +<TBD>
-new system specifically geared for practical information extraction in +
-the enterprise. This effort lead to [[https://​www.google.be/​url?​sa=t&​rct=j&​q=&​esrc=s&​source=web&​cd=2&​cad=rja&​ved=0CEYQFjAB&​url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.179.356%26rep%3Drep1%26type%3Dpdf&​ei=gyhIUe-XPIexPJ-fgLAG&​usg=AFQjCNHgkbcREbd6bCA26BVf0FuIZ9n7Sg&​sig2=LVQkus_67uSVlwK34BXZ8w&​bvm=bv.43828540,​d.ZWU|SystemT]] , a rule-based IE system with an SQL-like declarative language named [[http://​pic.dhe.ibm.com/​infocenter/​bigins/​v2r0/​topic/​com.ibm.swg.im.infosphere.biginsights.analyze.doc/​doc/​aql_overview.html|AQL (Annotation Query Language)]]. +
-The declarative nature of AQL enables new kinds of tools for extractor +
-development,​ and a cost-based optimizer for +
-performance.  ​+
  
-The goal of this project is to develop an open-source compiler and 
-runtime environment of (a simplified version of) AQL. 
  
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)+===== Implementing TSBS on MobilityDB =====
  
-**Status**available+The Time Series Benchmark Suite ([[https://​github.com/​timescale/​tsbs|TSBS]]) is a collection of Go programs that are used to generate datasets and then benchmark read and write performance of various time series databases. This bechmark has been developed by [[https://​www.timescale.com/​|TimescaleDB]],​ which is a time series extension of PostgreSQL. ​
  
-=== Development ​of a distributed simulation algorithm ====+A significant addition ​of TimescaleDB to PosgreSQL is the addition of the [[https://​blog.timescale.com/​blog/​simplified-time-series-analytics-using-the-time_bucket-function/​|time_bucket]] function. This function allows to partition the time line in user-defined interval units that are used for aggregating data.
  
-Simulation and Bisimulation are fundamental notions ​in computer +The project consists ​in implementing a multidimensional generalization of the time_bucket function that allows the user to partition ​the spatial and/or temporal domain ​of a table in units (or tiles) that can be used for aggregating ​data. Then, the project consists ​of performing ​benchmark comparison of TimescaleDB and MobilityDB.
-science. They underly many formal verification algorithms, and have +
-recently been applied ​to the construction ​of so-called structural +
-indexes,​which are novel index data structures for relational databases +
-and the Semantic Web ​Essentiallya (bi)simulation is a relation on +
-the nodes of a graph. Unfortunately,​ however, while efficient +
-main-memory algorithms for computing whether two nodes are similar +
-exist, these algorithms fail when no the input graphs are too large to +
-fit in main memory+
  
-The objective of this project is to implement a recently proposed 
-algorithm for  computing simulation in a distributed setting, and 
-provide a preliminary performance evaluation of this implementation. 
  
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) 
  
-**Status**: available+===== Distributed Moving Object Database on Amazon AWS ===== 
 +A distributed database is an architecture in which multiple database instances on different machines are integrate in order to form a single database server. Both the data and the queries are then distributed over these database instances. This architecture is effective in deploying big databases on a cloud platform.
  
 +MobilityDB is engineered as an extension of PostgreSQL. AWS supports PostgreSQL databases in [[https://​aws.amazon.com/​rds/​postgresql/​|Amazon RDS]] for PostgreSQL and in [[https://​aws.amazon.com/​rds/​aurora/​postgresql-features/​|Amazon Aurora]]. The goal of this project is to integrate MobilityDB with these products. The key outcomes are a comprehensive assessment of which MOD API can/cannot be distributed,​ and an assessment of the performance gain. These outcomes should serve as a base for a thesis project to achieve effective integration.
  
  
-==== Development of a Personal Scientific Digital Library Management System ​====+===== Distributed Moving Object Database on MS Azure ===== 
 +A distributed database is an architecture in which multiple database instances on different machines are integrate in order to form a single database server. Both the data and the queries are then distributed over these database instances. This architecture is effective in deploying big databases on a cloud platform.
  
-In this project, the student ​is asked to construct a software system to help manage large collections ​of scientific papers in digital formSpecifically,​ the system must be able to: +MobilityDB ​is engineered as an extension ​of PostgreSQLMS Azure supports distributed PostgreSQL databases using [[https://www.citusdata.com/​|Citus]]We have made successful tests for integrating MobilityDB and Citus on a local cluster. The goal of this project is to repeat ​this work on MS Azureintegrate MobilityDB with these productsThe key outcomes are comprehensive assessment ​of which MOD API can/cannot be distributedand an assessment ​of the performance gainThese outcomes should serve as base for a thesis project ​to achieve effective integration.
-  - Scan a given filesystem location for given filetypes (PDFs, EPUB, ...) containing scientific articles. +
-  - Extract the metadata from each identified file. Here, the metadata includes the title of the article, its authors, the publishing venue, the publisher, the year of publication,​ the article'​s abstract ... The development ​of an intelligent way to retreive ​this metadata is requriedThis could be done, for example by combination ​of parsing the filecontacting the internet repositories ​of known publishers (AMC, Springer, Elsevier) etc to retrieve ​the data. +
-  - Offer search capabilities,​ in order to allow user to find all indexed articles matching certain criteria (title, author, ...+
-  - Offer archiving capabilities+
  
-Use of semantic web technologies (RDFSPARQL, ...to store and search ​the metadata ​is encouraged.+===== Map-matching as a Service ===== 
 +GPS location tracks typically contain errorsas the GPS points will normally be some meters away from the true position. If we know that the movement happened on a street networke.g., a bus or a car, then we can correct this back by putting the points on the streetLuckily there are Algorithms for this, called Map-Matching. There are also a handful of open source systems that do map matching. It remains however difficult ​to end users to use them, because they involve non-trivial installation ​and configuration effort. Preparing the base map, which will be used in the matching ​is also an issue to users 
 + 
 +The goal of this project is to build an architecture for a Map-Matching service. The challanges are that the GPS data arrives in different formats, and that Map-Matching is a time consuming Algorithm. This architecture should thus allow different input formats, and should be able to automatically scale according to the request rate. Another key outcome of this project is to compare the existing Map-Matching implementations,​ and to discuss their suitability in real world problems. 
 + 
 +Links: 
 +  * [[https://​github.com/​bmwcarit/​barefoot|Barefoot]] 
 +  * [[https://​valhalla.readthedocs.io/​en/​latest/​api/​map-matching/​api-reference/​|Valhalla Map Matching API]]  
 +  * [[https://​github.com/​graphhopper/​map-matching|GraphHopper]] 
 +  * [[https://​github.com/​cyang-kth/​fmm|Fast Map Matching]] 
 + 
 + 
 +===== Geospatial Trajectory Data Cleaning ===== 
 + 
 + 
 +===== Geospatial Trajectory Similarity Measure ===== 
 + 
 + 
 +===== Spatiotemporal k-Nearest Neighbour (kNN) Queries =====
  
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) 
  
-**Status**: available 
  
 
teaching/projh402.txt · Last modified: 2022/09/06 10:39 by ezimanyi