Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
teaching:mfe:is [2019/06/07 14:56]
svsummer [Graph Indexing for Fast Subgraph Isomorphism Testing]
teaching:mfe:is [2019/06/07 15:12]
svsummer [Multi-query Optimization in Spark]
Line 28: Line 28:
  
  
-===== Multi-query Optimization ​in Spark =====+===== Dynamic Query Processing ​in Modern Big Data Architectures ​=====
  
-Distributed computing platforms such as Hadoop and Spark focus on addressing ​the following challenges in large systems: (1) latency, (2) scalability,​ and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resourcesUnified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously running applicationsHowever, it is up to the developers to decide on what to share.+Dynamic Query Processing refers to the activity ​of processing queries under constant data updates(This is also known as continuous querying)It is a core problem in modern analytic workloads.
  
-The objective of this master thesis is to optimize various applications running ​on a Spark platformoptimize their execution ​plans by autonomously finding sharing opportunitiesnamely finding ​the RDDs that can be shared among these applications, and computing these shared plans once instead ​of multiple times for each query.+Modern big data compute architectures such as Apache Spark, Apache Flink, and apache Storm support certain form of Dynamic Query Processing. 
 + 
 +In addition, our lab has recently proposed DYN, a new Dynamic Query Processing algorithm that has strong optimality guarantees, but works in a centralised setting. 
 + 
 +The objective of this master thesis is to propose extensions to our algorithm that make it suitable for distributed implementation ​on one of the above-mentioned platformsand compare its execution ​efficiency against the state-of-the art solutions provided ​by Spark, Flink, and Storm. In order to make this comparison meaningfull, the student is expected to research, survey, and summarize the principles underlying the current state-of-the art approaches.
  
 **Deliverables** of the master thesis project **Deliverables** of the master thesis project
-  ​* An overview of the Apache ​Spark architecture. +     * An overview of the continuous query processing models of Flink, ​Spark and Storm 
-  Develop a performance model for queries executed by Spark+     ​A qualitive comparison of the algorithms used 
-  * An implementation that optimizes ​queries ​executed by Spark and identify sharing opportunities. +     * A proposal ​for generalizing DYN to the distributed setting
-  * An experimental validation of the developed system.+     ​* An implementation ​of this geneneralization by means of a compiler ​that outputs a continous query processing plan 
 +     * A benchmark set of continuous ​queries and associated data sets for the experimental validation 
 +     ​* An experimental validation of the extension and state of the art 
  
 **Interested?​** Contact :  [[svsummer@ulb.ac.be|Stijn Vansummeren]] **Interested?​** Contact :  [[svsummer@ulb.ac.be|Stijn Vansummeren]]
  
 **Status**: available **Status**: available
- 
 ===== Graph Indexing for Fast Subgraph Isomorphism Testing ===== ===== Graph Indexing for Fast Subgraph Isomorphism Testing =====
  
 
teaching/mfe/is.txt · Last modified: 2020/09/29 17:03 by mahmsakr