Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
teaching:projh402 [2021/09/08 09:09]
ezimanyi [MobilityDB on Google Cloud Platform]
teaching:projh402 [2021/10/04 19:19]
msakr
Line 1: Line 1:
-====== MA Computer Science Projects (PROJ-H-402) ​ ======+====== MA Computer Science Projects (PROJ-H-402) ​===== 
 +====== Master'​s Thesis Projects (MEMO-H-504) ​======
  
  
 ===== Course objective ===== ===== Course objective =====
-The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the [[http://​iridia.ulb.ac.be/​wiki/​PROJ-H-402_-_Computing_Project:​_Rules|course description page]] ​  for the rules concerning the project.  ​What follows ​is a list of project proposals supervised ​by academic members of the WIT laboratory.+The course PROJ-H-402 is managed by Pr. Mauro Birattari. Please refer to the [[http://​iridia.ulb.ac.be/​wiki/​PROJ-H-402_-_Computing_Project:​_Rules|course description page]] ​  for the rules concerning the project.  ​ 
 +The course MEMO-H504 ​is managed ​by Pr. Nicolas Cerf.
  
 +What follows is a list of project proposals supervised by academic members of the WIT laboratory. The proposals below concern both PROJ-H-402 or MEMO-H-504. ​
 ===== Projects in Mobility Databases ===== ===== Projects in Mobility Databases =====
  
Line 79: Line 82:
  
 Links: Links:
-  * {{:​teaching:​symbolic_trajectories.pdf|}}+  * R.H. Guting, F Valdés, M.L. Damiani, ​{{:​teaching:​symbolic_trajectories.pdf|Symbolic Trajectories}}, ACM Transactions on Spatial Algorithms Systems, (1)2, Article 7, 2015 
  
 ===== Trajectory Data Warehouses ===== ===== Trajectory Data Warehouses =====
Line 107: Line 111:
  
 The goal of this project is to survey the state of the art in geospatial trajectory data cleaning, both model-based and machine learning. The work also includes prototyping and empirically evaluating a selection of these methods in the MobilityDB system, and on different real datasets. These outcomes should serve as a base for a thesis project to enhance geospatial trajectory data cleaning. The goal of this project is to survey the state of the art in geospatial trajectory data cleaning, both model-based and machine learning. The work also includes prototyping and empirically evaluating a selection of these methods in the MobilityDB system, and on different real datasets. These outcomes should serve as a base for a thesis project to enhance geospatial trajectory data cleaning.
 +
 +===== Dynamic Time Warping for Trajectories =====
 +
 +The dynamic time warping (DTW) algorithm is able to find the optimal alignment between two time series. It is often used to determine time series similarity, classification,​ and to find corresponding regions between two time series. Several dynamic time warping implementations are available. However, DTW has a quadratic time and space complexity that limits its use to small time series data sets. Therefore, a fast approximation of DTW that has linear time and space complexity has been proposed.
 +
 +The goal of this project is to survey and to prototype in MobilityDB the state of art methods in dynamic time warping. ​
 +
 +  * T. Giorgino, [[https://​www.jstatsoft.org/​article/​view/​v031i07|Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package]], Journal of Statistical Software, (31)7, 2009.
 +  * S. Salvador, P. Chan, [[https://​cs.fit.edu/​~pkc/​papers/​tdm04.pdf|FastDTW:​ Toward Accurate Dynamic Time Warping in Linear Time and Space]], Intelligent Data Analysis, 11(5):​561-580,​ 2007.
 +  * D.F. Silva, G.E.A.P.A. Batista, [[http://​sites.labic.icmc.usp.br/​dfs/​pdf/​SDM_PrunedDTW.pdf|Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation]],​ Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 837-845, 2016.
 +  * G. Al-Naymat, S. Chawla, J. Taheri (2012). [[https://​arxiv.org/​abs/​1201.2969|SparseDTW:​ A Novel Approach to Speed up Dynamic Time Warping]]. CoRR abs/​1201.2969,​ 2012.
 +  *  M. Müller, H. Mattes, F. Kurth, ​ [[https://​www.audiolabs-erlangen.de/​content/​05-fau/​professor/​00-mueller/​03-publications/​2006_MuellerMattesKurth_MultiscaleAudioSynchronization_ISMIR.pdf|An Efficient Multiscale Approach to Audio Synchronization]]. Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 192-197, 2006.
 +  * Thomas Prätzlich, Jonathan Driedger, and Meinard Müller, [[https://​www.researchgate.net/​publication/​303667579_Memory-Restricted_Multiscale_Dynamic_Time_Warping|Memory-Restricted Multiscale Dynamic Time Warping]], Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 569-573, 2016.
  
 ===== Geospatial Trajectory Similarity Measure ===== ===== Geospatial Trajectory Similarity Measure =====
Line 137: Line 154:
  
  
 +===== Finding Logic Bugs in Database Systems =====
 +Database management systems are complex in their implementations. Implementation errors (Bugs) are therefore inevitable, no matter how mature the implementation is. Logic bugs are one kind of bugs, that cause a query to return incorrect results (e.g., less or more data in the results). Since they don't result in a system crash, logic bugs are difficult to discover.
 +
 +{{:​teaching:​mfe:​logicBugs.png?​400|}}
 +
 +source: Finding Logic Bugs in Database Management Systems (Manuel Rigger, ETH SQLancer)
 +
 +[[https://​github.com/​sqlancer/​sqlancer|SQLLancer]] is an interesting tool that finds logic bugs by implementing a set of automated logic techniques for generated test queries. One techniques, for instance is the Ternary Logic Partitioning (TLP). TLP partitions a query into three partitioning queries, each of which computes its result, where respectively a predicate p, NOT p, and p IS NULL holds. Clearly the union of the three results must be the whole relation, otherwise a bug is spotted !
 +
 +
 +
 +The aim of this project is to extend SQLLancer (and the theory behind it) for testing moving object databases. In more detail, the goals are:
 +  - Identify the relevant state of the art
 +  - Test SQLLancer, and repeat all its experiments ​
 +    * ----- Milestone PROJ-H-402 ​
 +  - Extend the logic bug finding methods for spatiotemporal trajectories
 +  - Find logic bug in MobilityDB ​
 +    * ----- Milestone MEMO-H-504
 +
 +Links for further readings:
 +  * [[https://​www.usenix.org/​system/​files/​osdi20-rigger.pdf|Testing Database Engines via Pivoted Query Synthesis (OSDI '​20)]] ​
 +  * [[https://​arxiv.org/​abs/​2007.08292|Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction (ESEC/FSE '20) ]]
 +  * [[https://​dl.acm.org/​doi/​pdf/​10.1145/​3428279|Finding Bugs in Database Systems via Query Partitioning (OOPSLA '20) ]]
 +
 +===== Improving Database Query Performance Using Learned Indexes =====
 +Secondary indexes are essential for fast query evaluation in databases. An index is a data structure that stores attribute values in a way that enables fast search, and links these values to their tuples in the storage medium. Recent works revealed that the
 +learned index can improve query performance while reducing the index storage overhead.
 +
 +The goal of this thesis is to design and build learned indexes for geospatial trajectory data. In more detail, the goals are:
 +  - Identify the relevant state of the art
 +  - Implement and benchmark the SPRIG learned index [1] 
 +    * ----- Milestone PROJ-H-402 ​
 +  - Design a spatiotemporal learned index
 +  - Implement in MobilityDB ​
 +    * ----- Milestone MEMO-H-504
  
 +Links for further readings:
 +  * [[https://​arxiv.org/​pdf/​2102.06789.pdf|Spatial Interpolation-based Learned Index for Range and kNN Queries]] ​
 +  * [[https://​www.cs.purdue.edu/​homes/​aref/​LMDI2020/​LMDI_Tutorial_SIGSpatial2020.pdf|A Tutorial on Learned Multi-dimensional Indexes (SigSpatial20)]]
  
 
teaching/projh402.txt · Last modified: 2022/09/06 10:39 by ezimanyi