Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
teaching:mfe:is [2019/06/07 14:56]
svsummer [Dynamic Query Processing on GPU Accelerators]
teaching:mfe:is [2020/09/29 17:03]
mahmsakr [Data modeling of spatiotemporal regions]
Line 28: Line 28:
  
  
-===== Multi-query Optimization ​in Spark =====+===== Dynamic Query Processing ​in Modern Big Data Architectures ​=====
  
-Distributed computing platforms such as Hadoop and Spark focus on addressing ​the following challenges in large systems: (1) latency, (2) scalability,​ and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resourcesUnified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously running applicationsHowever, it is up to the developers to decide on what to share.+Dynamic Query Processing refers to the activity ​of processing queries under constant data updates(This is also known as continuous querying)It is a core problem in modern analytic workloads.
  
-The objective of this master thesis is to optimize various applications running ​on a Spark platformoptimize their execution ​plans by autonomously finding sharing opportunitiesnamely finding ​the RDDs that can be shared among these applications, and computing these shared plans once instead ​of multiple times for each query.+Modern big data compute architectures such as Apache Spark, Apache Flink, and apache Storm support certain form of Dynamic Query Processing. 
 + 
 +In addition, our lab has recently proposed DYN, a new Dynamic Query Processing algorithm that has strong optimality guarantees, but works in a centralised setting. 
 + 
 +The objective of this master thesis is to propose extensions to our algorithm that make it suitable for distributed implementation ​on one of the above-mentioned platformsand compare its execution ​efficiency against the state-of-the art solutions provided ​by Spark, Flink, and Storm. In order to make this comparison meaningfull, the student is expected to research, survey, and summarize the principles underlying the current state-of-the art approaches.
  
 **Deliverables** of the master thesis project **Deliverables** of the master thesis project
-  ​* An overview of the Apache ​Spark architecture. +     * An overview of the continuous query processing models of Flink, ​Spark and Storm 
-  Develop a performance model for queries executed by Spark+     ​A qualitive comparison of the algorithms used 
-  * An implementation that optimizes ​queries ​executed by Spark and identify sharing opportunities. +     * A proposal ​for generalizing DYN to the distributed setting
-  * An experimental validation of the developed system.+     ​* An implementation ​of this geneneralization by means of a compiler ​that outputs a continous query processing plan 
 +     * A benchmark set of continuous ​queries and associated data sets for the experimental validation 
 +     ​* An experimental validation of the extension and state of the art
  
-**Interested?​** Contact :  [[svsummer@ulb.ac.be|Stijn Vansummeren]] 
  
-**Status**: available+**Interested?** Contact ​ ​[[svsummer@ulb.ac.be|Stijn Vansummeren]]
  
 +**Status**: taken
 ===== Graph Indexing for Fast Subgraph Isomorphism Testing ===== ===== Graph Indexing for Fast Subgraph Isomorphism Testing =====
  
Line 52: Line 58:
 **Interested?​** Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] **Interested?​** Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]]
  
-**Status**: ​available+**Status**: ​taken
  
  
Line 90: Line 96:
   * Contact : [[ezimanyi@ulb.ac.be|Esteban Zimanyi]]   * Contact : [[ezimanyi@ulb.ac.be|Esteban Zimanyi]]
  
-**Status**: ​available+**Status**: ​taken 
  
 =====Mobility data exchange standards===== =====Mobility data exchange standards=====
Line 120: Line 127:
 **Status**: available **Status**: available
  
-=====Data modeling of spatiotemporal regions===== 
-In moving object databases, a lot of attention has been given to moving point objects. Many data model have been proposed for this. Less attention has been given to moving region objects. Imagine a herd of animals that moves together in the wild. At any time instant, this herd can be represented using a spatial region, e.g., their convex hull. Over time, this regions changes place and extent. A spatiotemporal region is an abstract data type that can represent this temporal evolution of the region. ​ 
  
-This thesis ​is about proposing ​a data model for spatiotemporal regionsand implementing ​it in MobilityDBThis includes surveying ​the literature on moving object databases, and specifically on spatiotemporal reigonsproposing ​discrete ​data modelimplementing it, and implementing the basic data base functions and operations to make use of it+=====Scalable Map-Matching===== 
 +GPS trajectories originate in the form of a series of absolute lat/lon coordinates. Map-matching ​is the method of locating the GPS observations onto road network. It transforms the lat/lon pairs into pairs of a road identifier and a fraction representing the relative position on the road. This preprocessing is essential to trajectory data analysis. It contributes to cleaning the data, as well as preparing ​it for network-related analysisThere are two modes of map-matching:​ (1) offline, where all the observations of the trajectory exist before starting the map-matching, and (2) onlinewhere the observation arrive to the map-matcher one by one in streaming fashion. Map-matching is known to be an expensive pre-processing,​ in terms of processing time. The growing amount of trajectory ​data (e.g.autonomous cars) call for map-matching methods that can scale-out. This thesis is about proposing such a solution. It shall survey the existing Algorithms, benchmark them, and propose a scale out architecture  ​
  
 +MobilityDB has types for lat/lon trajectories,​ as well as map-matched trajectories. the implementation of this thesis shall be integrated with MobilityDB. ​
  
 **Interested?​** **Interested?​**
 
teaching/mfe/is.txt · Last modified: 2020/09/29 17:03 by mahmsakr