This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
teaching:projh402 [2018/10/02 08:13] svsummer |
teaching:projh402 [2022/09/06 10:39] (current) ezimanyi |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== MA Computer Science Projects (PROJ-H-402) ====== | + | ====== PROJ-H-402 : Computing Projects ====== |
| - | + | This is the list of Computing Projects topics proposed for the current academic year by the CoDE department, École polytechnique de Bruxelles, ULB. | |
| - | ===== Course objective ===== | + | |
| - | The course PROJ-H-402 is managed by Dr. Mauro Birattari. Please refer to the course description page http://iridia.ulb.ac.be/proj-h-402/index.php/Main_Page for the rules concerning the project. What follows is a list of project proposals supervised by academic members of CoDE. | + | |
| - | + | ||
| - | ===== Project proposals ===== | + | |
| - | + | ||
| - | === Engineering of a Rule-Based Information Extraction Engine === | + | |
| - | + | ||
| - | Information extraction, the activity of extracting structured | + | |
| - | information from unstructured text, is a core data preparation | + | |
| - | step. Systems for information extraction fall into two main | + | |
| - | categories. The first category contains machine-learning based | + | |
| - | systems, where a significant amount of training is required to train | + | |
| - | good models for specific extraction tasks. The second category | + | |
| - | consists of rule-based systems in which the data to be extracted from | + | |
| - | the text is specified by (human-written) rules in some (often | + | |
| - | declarative) extraction language. Despite advances in machine | + | |
| - | learning, rule-based systems are widely used in practice. | + | |
| - | + | ||
| - | In recent years, novel theoretical algorithms have been proposed to | + | |
| - | more efficiently execute rule-based information extraction | + | |
| - | workloads. The objective in this project is to implement one such | + | |
| - | Algorithm, by Florenzano et al (2018), experimentally analyze its | + | |
| - | performance, and propose extensions of the algorithm to overcome | + | |
| - | performance bottlenecks. | + | |
| - | + | ||
| - | + | ||
| - | References: | + | |
| - | + | ||
| - | - Fernando Florenzano, Cristian Riveros, Martín Ugarte, | + | |
| - | Stijn Vansummeren, Domagoj Vrgoc: Constant Delay Algorithms for | + | |
| - | Regular Document Spanners. PODS 2018: 165-177 | + | |
| - | + | ||
| - | + | ||
| - | **Interested?** Contact Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
| - | + | ||
| - | **Status**: available | + | |
| - | + | ||
| - | + | ||
| - | === Query processing for mixed database-machine learning based workloads === | + | |
| - | + | ||
| - | Because of the growing importance and wide deployment of large-scale | + | |
| - | Machine Learning (ML), there is wide interest in the design and | + | |
| - | implementation of processing engines that can efficiently evaluate ML | + | |
| - | workloads. One class of sytems, embodied by systems such as Tensorflow | + | |
| - | and SystemML takes linear algebra as the key primitive for expressing | + | |
| - | ML workflows, and obtain efficient processing engines by porting known | + | |
| - | database-style optimization techniques to the linear algebra | + | |
| - | setting. Another class of systems, embodied by FAQ queries take | + | |
| - | relational algebra as the key primitive, but modify it to allow | + | |
| - | expression of certain ML workloads. To some extent, the classical | + | |
| - | optimization techniques as well as recent results for exploiting | + | |
| - | modern hardware transfer to this extended relational algebra. As an | + | |
| - | added bonus, traditional database workloads (OLTP/OLAP style) can be | + | |
| - | trivially supported | + | |
| - | + | ||
| - | The focus in this project is in the latter style of systems. The | + | |
| - | overall goal is to experimentally identify classes of FAQ queries for | + | |
| - | which it would be beneficial to exploit techniques developped in the | + | |
| - | former class of systems. Concretely, this can be approached by | + | |
| - | experimentally studying queries in the FAQ framework (featuring joins) | + | |
| - | for which known results in evaluating linear algebra operations (in | + | |
| - | concretum: matrix multiplication algorithms that run in less than | + | |
| - | O(n^3) time) can be exploited. | + | |
| - | + | ||
| - | **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) | + | |
| - | + | ||
| - | **Status**: available | + | |
| + | * [[teaching:projh402:wis|Data Science and Engineering]] | ||
| + | * [[teaching:projh402:ia|Artificial Intelligence]] | ||
| + | * [[teaching:projh402:or|Operational Research and Decision Aid]] | ||