This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
teaching:mfe:is [2017/10/25 11:46] msakr [Assessing Existing Communication Protocols In The Context Of DaaS] |
teaching:mfe:is [2018/04/23 09:54] svsummer [Master Thesis in Collaboration with Euranova] |
||
---|---|---|---|
Line 24: | Line 24: | ||
* Contact : [[ezimanyi@ulb.ac.be|Esteban Zimanyi]] | * Contact : [[ezimanyi@ulb.ac.be|Esteban Zimanyi]] | ||
+ | |||
+ | ** Dynamic Query Processing on GPU Accelerators | ||
+ | |||
+ | This master thesis is put forward in the context of the DFAQ | ||
+ | Research Project: "Dyanmic Processing of Frequently Asked | ||
+ | Queries", funded by the Wiener-Anspach foundation. | ||
+ | |||
+ | Within this project, our lab is hence developing novel ways for | ||
+ | processing "fast Big Data", i.e., processing of analytical queries | ||
+ | where the underlying data is constantly being updated. The | ||
+ | analytics problems envisioned cover wide areas of computer science | ||
+ | and include database aggregate queries, probabilistic inference, | ||
+ | matrix chain computation, and building statistical models. | ||
+ | |||
+ | The objective of this master thesis is to build upon the novel | ||
+ | dynamic processing algorithms being developed in the lab, and | ||
+ | complement these algorithms by proposing dynamic evaluation | ||
+ | algorithms that execute on modern GPU architectures, thereby | ||
+ | exploiting their massive parallel processing capabilities. | ||
+ | |||
+ | Since our current development is done in the Scala programming | ||
+ | language, prospective students should either know Scala, or being | ||
+ | willing to learn it within the context of the master thesis. | ||
+ | |||
+ | *Validation of the approach* Validation of master thesis' work | ||
+ | should be done on two levels: | ||
+ | - a theoretical level; by proposing and discussing alternative ways | ||
+ | to do incremental computation on GPU architectures, and comparing | ||
+ | these from a theoretical complexity viewpoint | ||
+ | - an experimental level; by proposing a benchmark collection of CEP | ||
+ | queries that can be used to test the obtained versions of the | ||
+ | interpreter/compiler, and report on the experimentally observed | ||
+ | performance on this benchmark. | ||
+ | |||
+ | *Deliverables* of the master thesis project | ||
+ | - An overview of query processing on GPUs | ||
+ | - A definition of the analytics queries under consideration | ||
+ | - A description of different possible dynamic evaluation algorithms | ||
+ | for the analytical queries on GPU architectures. | ||
+ | - A theoretical comparison of these possibilities | ||
+ | - The implementaiton of the evaluation algorithm(s) (as an interpreter/compiler) | ||
+ | - A benchmark set of queries and associated data sets for | ||
+ | the experimental validation | ||
+ | - An experimental validation of the compiler, and analysis of the results. | ||
+ | |||
+ | *Interested?* | ||
+ | - Contact : [[svsummer@ulb.ac.be][Stijn Vansummeren]] | ||
+ | |||
+ | *Status*: available | ||
+ | |||
+ | |||
+ | ** Complex Event Processing in Apache Spark and Apache Storm | ||
+ | |||
+ | The master thesis is put forward in the context of the SPICES | ||
+ | "Scalable Processing and mIning of Complex Events for | ||
+ | Security-analytics" research project, funded by Innoviris. | ||
+ | |||
+ | Within this project, our lab is developping a declarative language | ||
+ | for Complex Event Processing (CEP for short). The goal in Complex | ||
+ | Event Processing is to derive pre-defined patterns in a stream of | ||
+ | raw events. Raw events are typically sensor readings (such as | ||
+ | "password incorrect for user X trying to log in on machine Y" or | ||
+ | "file transfer from machine X to machine Y"). The goal of CEP is | ||
+ | then to correlate these events into complex events. For example, | ||
+ | repeated failed login attempts by X to Y should trigger a complex | ||
+ | event "password cracking warning" that refers to all failed login | ||
+ | attempts. | ||
+ | |||
+ | The objective of this master thesis is to build an | ||
+ | interpreter/compiler for this declarative CEP language that targets | ||
+ | the distributed computing frameworks Apache Spark and/or Apache | ||
+ | Storm as backends. Getting aquaintend with these technologies is | ||
+ | part of the master thesis objective. | ||
+ | |||
+ | *Validation of the approach* Validation of the proposed | ||
+ | interpreter/compiler should be done on two levels: | ||
+ | - a theoretical level; by comparing the generated Spark/Storm | ||
+ | processors to a processor based on "Incremental computation" that | ||
+ | is being developped at the lab | ||
+ | - an experimental level; by proposing a benchmark | ||
+ | collection of CEP queries that can be used to test the obtained | ||
+ | interpreter/compiler, and report on the experimentally observed | ||
+ | performance on this benchmark. | ||
+ | |||
+ | *Deliverables* of the master thesis project | ||
+ | - An overview of the processing models of Spark and Storm | ||
+ | - A definition of the declarative CEP language under consideration | ||
+ | - A description of the interpretation/compilation algorithm | ||
+ | - A theoretical comparison of this algorithm wrt an incremental | ||
+ | evaluation algorithm. | ||
+ | - The interpreter/compiler itself (software artifact) | ||
+ | - A benchmark set of CEP queries and associated data sets for | ||
+ | the experimental validation | ||
+ | - An experimental validation of the compiler, and analysis of the results. | ||
+ | |||
+ | *Interested?* | ||
+ | - Contact : [[svsummer@ulb.ac.be][Stijn Vansummeren]] | ||
+ | |||
+ | *Status*: available | ||
+ | |||
+ | ** Graph Indexing for Fast Subgraph Isomorphism Testing | ||
+ | |||
+ | There is an increasing amount of scientific data, mostly from the | ||
+ | bio-medical sciences, that can be represented as collections of | ||
+ | graphs (chemical molecules, gene interaction networks, ...). A | ||
+ | crucial operation when searching in this data is that of subgraph | ||
+ | isomorphism testing: given a pattern P that one is interested in | ||
+ | (also a graph) in and a collection D of graphs (e.g., chemical | ||
+ | molecules), find all graphs in G that have P as a | ||
+ | subgraph. Unfortunately, the subgraph isomorphism problem is | ||
+ | computationally intractable. In ongoing research, to enable | ||
+ | tractable processing of this problem, we aim to reduce the number | ||
+ | of candidate graphs in D to which a subgraph isomorphism test needs | ||
+ | to be executed. Specifically, we index the graphs in the collection | ||
+ | D by means of decomposing them into graphs for which subgraph | ||
+ | isomorphism *is* tractable. An associated algorithm that filters | ||
+ | graphs that certainly cannot match P can then formulated based on | ||
+ | ideas from information retrieval. | ||
+ | |||
+ | In this master thesis project, the student will emperically | ||
+ | validate on real-world datasets the extent to which graphs can be | ||
+ | decomposed into graphs for which subgraph isomorphism is | ||
+ | tractable, and run experiments to validate the effectiveness of | ||
+ | the proposed method in terms of filtering power. | ||
+ | |||
+ | *Interested?* | ||
+ | - Contact : [[svsummer@ulb.ac.be][Stijn Vansummeren]] | ||
+ | |||
+ | *Status*: available | ||
+ | |||
===== Complex Event Processing in Apache Spark and Apache Storm ===== | ===== Complex Event Processing in Apache Spark and Apache Storm ===== |