Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
teaching:projh402 [2013/09/23 11:38]
svsummer created
teaching:projh402 [2015/06/26 16:09]
svsummer
Line 7: Line 7:
 ===== Project proposals ===== ===== Project proposals =====
  
-==== Development of a Personal Scientific Digital Library Management System ​====+===== Graph Indexing for Fast Subgraph Isomorphism Testing =====
  
-In this project, the student ​is asked to construct a software system to help manage large collections ​of scientific ​papers in digital form. Specifically, the system must be able to: +There is an increasing amount ​of scientific ​datamostly from the bio-medical sciences, that can be represented as collections of graphs ​(chemical moleculesgene interaction networks, ...). A crucial operation when searching in this data is that of subgraph ​   isomorphism testing: given a pattern P that one is interested in (also a graph) in and a collection D of graphs (e.g., chemical molecules)find all graphs in G that have P as a   ​subgraph. Unfortunately, the subgraph isomorphism problem ​is computationally intractableIn ongoing researchto enable tractable processing ​of this problemwe aim to reduce ​the number ​of candidate graphs ​in to which subgraph isomorphism test needs   to be executed. Specificallywe index the graphs in the collection D by means of decomposing them into graphs for which subgraph ​  ​isomorphism *is* tractableAn associated algorithm that filters graphs that certainly cannot match P can then formulated based on ideas from information retrieval.
-  - Scan a given filesystem location for given filetypes ​(PDFsEPUB, ...) containing scientific articles. +
-  - Extract the metadata from each identified fileHerethe metadata includes the title of the articleits authors, the publishing venue, the publisher, the year of publication,​ the article'​s abstract ... The development of an intelligent way to retreive this metadata ​is requriedThis could be donefor example by a combination ​of parsing the filecontacting ​the internet repositories ​of known publishers (AMC, Springer, Elsevier) etc to retrieve the data. +
-  - Offer search capabilities, ​in order to allow user to find all indexed articles matching certain criteria (title, author, ...) +
-  - Offer archiving capabilities+
  
-Use of semantic web technologies ​(RDFSPARQL, ...) to store and search ​the metadata is encouraged.+In this project, the student will emperically validate on real-world datasets the extent to which graphs can be decomposed into graphs for which subgraph isomorphism is tractable, and run experiments to validate the effectiveness ​of the proposed method in terms of filtering power. 
 + 
 +**Interested?​** Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] 
 + 
 +**Status**: available 
 + 
 +==== Principles of Database Management Architectures in Managed Virtual Environments ==== 
 + 
 +With the gaining popularity of Big Data, many data processing engines 
 +are implemented in a managed virtual environment such as the Java 
 +Virtual Machine ​(e.g.Apache HadoopApache Giraph, Drill, 
 +...). While this improves the portability of the engine, the tradeoffs 
 +and implementation principles w.r.t. traditional C++ implementations 
 +are sometimes less understood. 
 + 
 +The objective in this project is to develop some basic functionalities 
 +of a database storage engine (Linked files, BTree, Extensible Hash 
 +table, basic external-memory sorting ) in a managed virtual machine 
 +(i.e., the Java Virtual Machine or and the .NET Common Language 
 +Runtime), and compare this with a C++-based implementation both on (1) 
 +ease of implementation and (2) execution efficiency. In order to 
 +develop the managed virtual machine implementation,​ the interested 
 +student will need to research the best practices that are used in the 
 +above-mentioned projects to gain maximum execution speed (e.g., use of 
 +the java.lang.unsafe feature, memory-mapped files, ...).
  
 **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
Line 21: Line 41:
 **Status**: available **Status**: available
  
-==== Curriculum Revision Assistant ==== 
  
-In this project, the student is asked to construct ​software ​system ​that can assist ​in the revision of teaching curricula (also known as teaching programs). The system should have the following functionalities:​ +==== Development of a compiler and runtime engine for AQL ==== 
-  - It should be able to load existing curricula from the ULB central administration. This could be done, for example, by parsing the webpages available at banner (the Civil Engineering in CS program is available at http://banssbfr.ulb.ac.be/PROD_frFR/​bzscrse.p_disp_prog_detail?term_in=201314&prog_in=MA-IRIF&lang=FRENCHfor example). + 
-  - It should allow to make different versions ​of the teaching programsmuch in the same way as version control systems like GIT and subversion offer the possibility to make different "​development branches"​ of program'​s source code. +In 2005researchers at the IBM Almaden Research Center developped ​a 
-  ​- It should allow to analyze the modifications proposed in the teaching programs, ​and summarize the impact that these changes could have on other programs. ​(For example, if course is removed from the computer science curriculum, it should also be removed from all curricula that included the course.)+new system ​specifically geared for practical information extraction ​in 
 +the enterprise. This effort lead to [[https://www.google.be/url?sa=t&rct=j&​q=&​esrc=s&​source=web&​cd=2&​cad=rja&​ved=0CEYQFjAB&​url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.179.356%26rep%3Drep1%26type%3Dpdf&​ei=gyhIUe-XPIexPJ-fgLAG&usg=AFQjCNHgkbcREbd6bCA26BVf0FuIZ9n7Sg&​sig2=LVQkus_67uSVlwK34BXZ8w&​bvm=bv.43828540,d.ZWU|SystemT]] , a rule-based IE system with an SQL-like declarative language named [[http://​pic.dhe.ibm.com/​infocenter/​bigins/​v2r0/​topic/​com.ibm.swg.im.infosphere.biginsights.analyze.doc/​doc/​aql_overview.html|AQL (Annotation Query Language)]]
 +The declarative nature ​of AQL enables new kinds of tools for extractor 
 +development, and a cost-based optimizer for 
 +performance. ​  
 + 
 +The goal of this project is to develop an open-source compiler ​and 
 +runtime environment of (a simplified version of) AQL.
  
 **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be) **Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
 +
 +**Status**: available
 +
 +==== Development of a distributed simulation algorithm ====
 +
 +Simulation and Bisimulation are fundamental notions in computer
 +science. They underly many formal verification algorithms, and have
 +recently been applied to the construction of so-called structural
 +indexes,​which are novel index data structures for relational databases
 +and the Semantic Web.  Essentially,​ a (bi)simulation is a relation on
 +the nodes of a graph. Unfortunately,​ however, while efficient
 +main-memory algorithms for computing whether two nodes are similar
 +exist, these algorithms fail when no the input graphs are too large to
 +fit in main memory. ​
 +
 +The objective of this project is to implement a recently proposed
 +algorithm for  computing simulation in a distributed setting, and
 +provide a preliminary performance evaluation of this implementation.
 +
 +**Contact** : Stijn Vansummeren (stijn.vansummeren@ulb.ac.be)
 +
 +**Status**: available
  
  
  
  
 
teaching/projh402.txt · Last modified: 2022/09/06 10:39 by ezimanyi