This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
teaching:projh402 [2018/10/02 08:13] svsummer |
teaching:projh402 [2022/09/06 10:39] (current) ezimanyi |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== MA Computer Science Projects (PROJ-H-402) ====== | + | ====== PROJ-H-402 : Computing Projects ====== |
- | + | This is the list of Computing Projects topics proposed for the current academic year by the CoDE department, École polytechnique de Bruxelles, ULB. | |
- | ===== Course objective ===== | + | |
- | The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of CoDE. | + | |
- | + | ||
- | ===== Project proposals ===== | + | |
- | + | ||
- | === Engineering of a Rule-Based Information Extraction Engine === | + | |
- | + | ||
- | Information extraction, the activity of extracting structured | + | |
- | information from unstructured text, is a core data preparation | + | |
- | step. Systems for information extraction fall into two main | + | |
- | categories. The first category contains machine-learning based | + | |
- | systems, where a significant amount of training is required to train | + | |
- | good models for specific extraction tasks. The second category | + | |
- | consists of rule-based systems in which the data to be extracted from | + | |
- | the text is specified by (human-written) rules in some (often | + | |
- | declarative) extraction language. Despite advances in machine | + | |
- | learning, rule-based systems are widely used in practice. | + | |
- | + | ||
- | In recent years, novel theoretical algorithms have been proposed to | + | |
- | more efficiently execute rule-based information extraction | + | |
- | workloads. The objective in this project is to implement one such | + | |
- | Algorithm, by Florenzano et al (2018), experimentally analyze its | + | |
- | performance, and propose extensions of the algorithm to overcome | + | |
- | performance bottlenecks. | + | |
- | + | ||
- | + | ||
- | References: | + | |
- | + | ||
- | - Fernando Florenzano, Cristian Riveros, Martín Ugarte, | + | |
- | Stijn Vansummeren, Domagoj Vrgoc: Constant Delay Algorithms for | + | |
- | Regular Document Spanners. PODS 2018: 165-177 | + | |
- | + | ||
- | + | ||
- | **Interested?** Contact Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
- | + | ||
- | **Status**: available | + | |
- | + | ||
- | + | ||
- | === Query processing for mixed database-machine learning based workloads === | + | |
- | + | ||
- | Because of the growing importance and wide deployment of large-scale | + | |
- | Machine Learning (ML), there is wide interest in the design and | + | |
- | implementation of processing engines that can efficiently evaluate ML | + | |
- | workloads. One class of sytems, embodied by systems such as Tensorflow | + | |
- | and SystemML takes linear algebra as the key primitive for expressing | + | |
- | ML workflows, and obtain efficient processing engines by porting known | + | |
- | database-style optimization techniques to the linear algebra | + | |
- | setting. Another class of systems, embodied by FAQ queries take | + | |
- | relational algebra as the key primitive, but modify it to allow | + | |
- | expression of certain ML workloads. To some extent, the classical | + | |
- | optimization techniques as well as recent results for exploiting | + | |
- | modern hardware transfer to this extended relational algebra. As an | + | |
- | added bonus, traditional database workloads (OLTP/OLAP style) can be | + | |
- | trivially supported | + | |
- | + | ||
- | The focus in this project is in the latter style of systems. The | + | |
- | overall goal is to experimentally identify classes of FAQ queries for | + | |
- | which it would be beneficial to exploit techniques developped in the | + | |
- | former class of systems. Concretely, this can be approached by | + | |
- | experimentally studying queries in the FAQ framework (featuring joins) | + | |
- | for which known results in evaluating linear algebra operations (in | + | |
- | concretum: matrix multiplication algorithms that run in less than | + | |
- | O(n^3) time) can be exploited. | + | |
- | + | ||
- | **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
- | + | ||
- | **Status**: available | + | |
+ | * [[teaching:projh402:wis|Data Science and Engineering]] | ||
+ | * [[teaching:projh402:ia|Artificial Intelligence]] | ||
+ | * [[teaching:projh402:or|Operational Research and Decision Aid]] |