Differences

This shows you the differences between two versions of the page.

--- teaching:projh402 [2018/10/02 08:13]
svsummer
+++ teaching:projh402 [2020/09/14 20:08]
svsummer
@@ Line 6: / Line 6: @@
 ===== Project proposals =====
-=== Engineering of a Rule-Based Information Extraction Engine ===
-Information extraction, the activity of extracting structured
-information from unstructured text, is a core data preparation
-step. Systems for information extraction fall into two main
-categories. The first category contains machine-learning based
-systems, where a significant amount of training is required to train
-good models for specific extraction tasks. The second category
-consists of rule-based systems in which the data to be extracted from
-the text is specified by (human-written) rules in some (often
-declarative) extraction language. Despite advances in machine
-learning, rule-based systems are widely used in practice.
-In recent years, novel theoretical algorithms have been proposed to
-more efficiently execute rule-based information extraction
-workloads. The objective in this project is to implement one such
-Algorithm, by Florenzano et al (2018), experimentally analyze its
-performance, and propose extensions of the algorithm to overcome
-performance bottlenecks.
-References:
-- Fernando Florenzano, Cristian Riveros, Martín Ugarte,
-Stijn Vansummeren, Domagoj Vrgoc: Constant Delay Algorithms for
-Regular Document Spanners. PODS 2018: 165-177
-**Interested?** Contact Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
-**Status**: available
-=== Query processing for mixed database-machine learning based workloads ===
-Because of the growing importance and wide deployment of large-scale
-Machine Learning (ML), there is wide interest in the design and
-implementation of processing engines that can efficiently evaluate ML
-workloads. One class of sytems, embodied by systems such as Tensorflow
-and SystemML takes linear algebra as the key primitive for expressing
-ML workflows, and obtain efficient processing engines by porting known
-database-style optimization techniques to the linear algebra
-setting. Another class of systems, embodied by FAQ queries take
-relational algebra as the key primitive, but modify it to allow
-expression of certain ML workloads. To some extent, the classical
-optimization techniques as well as recent results for exploiting
-modern hardware transfer to this extended relational algebra. As an
-added bonus, traditional database workloads (OLTP/OLAP style) can be
-trivially supported
-The focus in this project is in the latter style of systems. The
-overall goal is to experimentally identify classes of FAQ queries for
-which it would be beneficial to exploit techniques developped in the
-former class of systems. Concretely, this can be approached by
-experimentally studying queries in the FAQ framework (featuring joins)
-for which known results in evaluating linear algebra operations (in
-concretum: matrix multiplication algorithms that run in less than
-O(n^3) time) can be exploited.
-**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
-**Status**: available