Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
teaching:projh402 [2018/10/02 08:13]
svsummer
teaching:projh402 [2020/09/14 20:08]
svsummer
Line 6: Line 6:
  
 ===== Project proposals ===== ===== Project proposals =====
- 
-=== Engineering of a Rule-Based Information Extraction Engine === 
- 
-Information extraction, the activity of extracting structured 
-information from unstructured text, is a core data preparation 
-step. Systems for information extraction fall into two main 
-categories. The first category contains machine-learning based 
-systems, where a significant amount of training is required to train 
-good models for specific extraction tasks. The second category 
-consists of rule-based systems in which the data to be extracted from 
-the text is specified by (human-written) rules in some (often 
-declarative) extraction language. Despite advances in machine 
-learning, rule-based systems are widely used in practice. 
- 
-In recent years, novel theoretical algorithms have been proposed to 
-more efficiently execute rule-based information extraction 
-workloads. The objective in this project is to implement one such 
-Algorithm, by Florenzano et al (2018), experimentally analyze its 
-performance,​ and propose extensions of the algorithm to overcome 
-performance bottlenecks. ​ 
- 
- 
-References: ​ 
- 
-- Fernando Florenzano, Cristian Riveros, Martín Ugarte, 
-Stijn Vansummeren,​ Domagoj Vrgoc: Constant Delay Algorithms for 
-Regular Document Spanners. PODS 2018: 165-177 
- 
- 
-**Interested?​** Contact Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) 
- 
-**Status**: available 
- 
- 
-=== Query processing for mixed database-machine learning based workloads === 
- 
-Because of the growing importance and wide deployment of large-scale 
-Machine Learning (ML), there is wide interest in the design and 
-implementation of processing engines that can efficiently evaluate ML 
-workloads. One class of sytems, embodied by systems such as Tensorflow 
-and SystemML takes linear algebra as the key primitive for expressing 
-ML workflows, and obtain efficient processing engines by porting known 
-database-style optimization techniques to the linear algebra 
-setting. Another class of systems, embodied by FAQ queries take 
-relational algebra as the key primitive, but modify it to allow 
-expression of certain ML workloads. To some extent, the classical 
-optimization techniques as well as recent results for exploiting 
-modern hardware transfer to this extended relational algebra. As an 
-added bonus, traditional database workloads (OLTP/OLAP style) can be 
-trivially supported 
- 
-The focus in this project is in the latter style of systems. The 
-overall goal is to experimentally identify classes of FAQ queries for 
-which it would be beneficial to exploit techniques developped in the 
-former class of systems. Concretely, this can be approached by 
-experimentally studying queries in the FAQ framework (featuring joins) 
-for which known results in evaluating linear algebra operations (in 
-concretum: matrix multiplication algorithms that run in less than 
-O(n^3) time) can be exploited. 
- 
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) 
- 
-**Status**: available 
  
  
 
teaching/projh402.txt · Last modified: 2022/09/06 10:39 by ezimanyi