Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
teaching:mfe:is [2014/02/19 15:16]
ezimanyi [Extending SPARQL for Spatio-temporal Data Support]
teaching:mfe:is [2014/06/03 11:22]
svsummer
Line 15: Line 15:
  
 <​note>​Please note that this list of subjects is **not exhaustive. Interested students are invited to propose original subjects.**</​note> ​ <​note>​Please note that this list of subjects is **not exhaustive. Interested students are invited to propose original subjects.**</​note> ​
 +
 +===== Master Thesis in Collaboration with Euranova =====
 +
 +Our laboratory performs collaborative research with Euranova R&D (http://​euranova.eu/​). The list of subjects proposed for this year by Euranova can be found 
 +{{:​teaching:​mfe:​mt2014_euranova.pdf|here}}
 +
 +These subject include topics on distributed graph processing, processing big data using Map/Reduce, cloud computing, and social networks.
 +
 +  * Contact : [[ezimanyi@ulb.ac.be|Esteban Zimanyi]]
 +
 +===== Master Thesis in Collaboration with DPI 24/7 Media Publishing =====
 +
 +The goal of the thesis is to set up a Saas / Paas solution for the deployement of the dpi 24/7 media publishing distribution in a Heroku-like style.
 +
 +During this master thesis you will not only realize a theoretical and technological analysis of the problem of such a deployment but also implement a concrete solution for the dpi 24/7 distribution.
 +
 +From a technical point of view you will :
 +  * Develop a service using Docker and Dokku for the on-demand deployment of instances of the DPI 24/7 distribution (full stack architecture)
 +  * Realize performance tests of the developed service
 +  * Study the different options of the Paas mode (full stack or elastic deployment)
 +
 +Second, you will analyze the different existing solutions for the orchestration of an elastic virtualization architecture.
 +
 +Technology used by the DPI 24/7 distribution : Linux, Varnish, NginX, Php-fpm, Mysql (in background Tomcat, SOLR).
 +
 +Virtualization technology : Container virtualization and deployment with Dokku
 +
 +Virtualization architecture : Full-stack versus Elactic
 +
 +Performance test of the architecture
 +
 +Evaluation of orchestration container : Salt, Serf, Flynn, Fig...
 +
 +Interested? DIP 27/7 Contact [[ddu@audaxis.com|Dimitri Dujardin]]. Academic Supervisor [[svsummer@ulb.ac.be|Stijn Vansummeren]]
  
 ===== Automatic detection of name variations ===== ===== Automatic detection of name variations =====
Line 101: Line 135:
 Interested? Contact [[toon.calders@ulb.ac.be|Toon Calders]] Interested? Contact [[toon.calders@ulb.ac.be|Toon Calders]]
  
-===== Master Thesis in Collaboration with Euranova ===== 
  
-Our laboratory performs collaborative research with Euranova R&D (http://​euranova.eu/​). The list of subjects proposed for this year by Euranova can be found  +===== Design and Implementation ​of a Curriculum Revision Tool ======
-{{:​teaching:​mfe:​euranova_master_thesis_2013_2014.pdf|here}}.+
  
-These subject include topics on distributed graph processingprocessing big data using Map/Reduce, cloud computing, and social networks.+Stijn Vansummeren (WIT)Frédéric Robert (BEAMS)
  
-  * Contact : [[ezimanyi@ulb.ac.be|Esteban Zimanyi]]+This MFE concers the analysis, design, and implementation of a 
 +software system that can assist in the revision of teaching curricula 
 +(also known as teaching programs).
  
-===== Efficient computation ​of simulation ​for structural indexing ​ =====+The primary targetted functionalities ​of the  software system are as 
 +follows: 
 +  * It should allow to make different versions of the teaching programs, much in the same way as version control systems like GIT and subversion offer the possibility to make different "​development branches"​ of a program'​s source code. 
 +  * It should ​ allow an extensible means to check the modified program ​for inconsistentcies. (For example, if course X has course Y as prerequisite,​ then course Y should not be scheduled in 2nd semester and X in 1st semester. Moreover, the total number of ECTS of all courses should be at most 60 ECTS. ) 
 +  * It should allow to analyze the modifications proposed in the teaching programs, and summarize the impact that these changes could have on other programs. (For example, if a course is removed from the computer science curriculum, it should be flagged that it should also be removed from all curricula that included the course.) 
 +  * It should load data from (and preferably, save data to) the ULB central administration database.  
 +  * It should give suggestions concerning the impact of the modifications on the course schedules.
  
-Simulation and bisimulation are  fundamental notions ​in computer science. They underlie many formal verification algorithms, and have recently been applied to the construction ​of indexing data structures for relational databases and the semantic web.+A proof-of-concept implementation of a revision tool that supports the first two requirements above is currently being developed ​in the context ​of a PROJH402 projectThe MFE student that selects this topic is expected to:
  
-Essentially,​ a simulation or bisimulation is relation on the nodes +  * Develop this prototype to production-ready implementation. 
-of a graphUnfortunately,​ however, while efficient main-memory +  * Implement ​the communication with the central ULB database. 
-algorithms for computing whether two nodes are simulating or bisimulating exist, these algorithms fail when no the input graphs are too large to fit in main memory. ​+  * Implement the impact analysis concerning the course schedules
 +  * Interact with the administration of the Ecole Polytechnique ​to fine-tune the above requirements;​ test the implementation;​ and integrate remarks after testing
  
-The goal of this thesis is to studycompare, and implement various +Contact : Stijn Vansummeren <​stijn.vansummeren@ulb.ac.be>​Frédéric Robert <​frrobert@ulb.ac.be>
-approaches to computing simulation in an external memory setting, for +
-the explicit purpose of using the implementation to efficiently construct +
-simulation-based indexes for large relational databases and the +
-semantic web.+
  
-  * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] ​+===== Design and Development of a Comprehensive DICOM validation application===== ​
  
 +Using the new XML machine-readable format of the DICOM standard (in the form of docbook documents), the architecture of software tools and services for the automatic extraction and utilization of the full content of the DICOM standard will be defined and the corresponding software solutions will be developed. A comprehensive DICOM validation application will also be developed as a pilot project using the previously created DICOM standard digital services.
 +
 +References: <​http://​dicom.nema.org/;​ http://​www.oasis-open.org/​docbook/>​
 +Requirements:​ XML, XSL, database, Java or Python or C++.
 +
 +Contacts : Arnaud Schenkel <​arnaud.schenkel@ulb.ac.be>,​ David Wikler <​david.wikler@ulb.ac.be>,​ Stijn Vansummeren <​stijn.vansummeren@ulb.ac.be>​
 +===== Structural compression of relational and semantic web databases =====
 +
 +Stijn Vansummeren (WIT)
 +
 +Recent research in database management systems at ULB has shown how to
 +theoretically construct succinct (compressed) representations for
 +relational databases and semantic web databases. The advantage of
 +these succinct representations is that they allow querying directly
 +*on the succinct representation*,​ without needing to consult the
 +underlying database.
 +
 +The goal of this thesis is to study scalable algorithms for
 +constructing the actual succinct representations. Some in-memory
 +algorithms are already known, but given the large size of typical
 +database, distributed and out-of-memory alternatives need to be found.
 +
 +
 +  * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] ​  
 +
 +
 +===== A contribution to Apache DRILL =====
 +
 +Google'​s research lab has produced a remarkable number of software
 +systems for the analytics of Big Data:
 +  * [[|Map/​Reduce]] for offline, batch-oriented data analysis over arbitrary datasets
 +  * [[http://​googleresearch.blogspot.be/​2009/​06/​large-scale-graph-computing-at-google.html|Pregel]] for offline analysis over graph-structured datasets
 +  * [[http://​research.google.com/​pubs/​pub36632.html|Dremel]] for on-line analysis over structured datasets
 +
 +For Map/Reduce and Pregel, the Apache Software foundation has
 +previously constructed open source implementations ([[http://​hadoop.apache.org/​|Hadoop]],​
 +[[https://​giraph.apache.org/​|Giraph]]). For Dremel, a project is
 +currently underway to provide an Open Source implementation (known as
 +[[http://​incubator.apache.org/​drill/​index.html|Apache Drill]]).
 +
 +The goal of this thesis is to (1) study the current architecture of Apache
 +Drill, (2) compare this with the state of the art in query processing
 +for structured datasets; (3) contribute to the development of the
 +Drill implementation.
 +
 +Students interested in this MFE are highly advised to follow the
 +course {{http://​cs.ulb.ac.be/​public/​teaching/​infoh417|INFOH417
 +Database Systems Architecture}} for a background on query processing
 +in traditional database management systems.
 +
 +  * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] ​  
 ===== Aspects of Text Analytics and Information Extraction ===== ===== Aspects of Text Analytics and Information Extraction =====
  
Line 178: Line 266:
 \\ \\
   * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]]  ​   * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]]  ​
 +  * Status: **already taken**
  
 ===== Distributed Structural Indexes for RDF Data ===== ===== Distributed Structural Indexes for RDF Data =====
Line 208: Line 297:
                                                                                                                                        
  
-=====Foundations of Data Description Languages===== 
  
-Recently, several small "​domain specific languages"​ have been proposed 
-to facilitate programming with ad hoc data (including PADS, 
-DATASCRIPT,​PACKETTYPES,​ Microsoft M Grammar). Ad hoc data is data 
-other than data in well-behaved relational or XML formats. 
- 
-The above languages take as input a description of the data format to 
-be dealt with, and automatically generate a large number of software 
-tools (parsers, serializers,​ data transformers,​ error recognition,​ 
-...) to process the ad-hoc data. 
- 
-The goal of this thesis is to study the programming language-theory 
-foundations behind these languages, their commonalities and their 
-differences. If possible, suggestions for further extensions to the 
-languages should be formulated. 
- 
-  * References : 
-      * http://​datascript.sourceforge.net/​ 
-      * http://​www.padsproj.org/​index.html 
- 
-\\ 
-  * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] 
- 
-=====Capturing Semantic ​ Web Data from Web Pages===== 
- 
- 
-The [[http://​linkeddata.org/​|Linked Open Data]] (LOD) initiative is aimed at extending the Web  by means of publishing various open datasets as RDF,  setting RDF links between data items from different data sources. ​ In spite of  the interest of organization in publishing their data, many of them are not willing to pay the price of devoting working hours or their employees for doing the hard work that preparing and updating these data requires. Therefore, a very interesting and practical problem that arises is how to produce LOD automatically from Web sites. This   ​problem can be tackled if selected and well-defined domains are chosen. ​ 
- 
-  
-In his thesis we propose to select a site of a broadcasting company, and, through intelligent crawling techniques capture data of interest and publish it as RDF data. In a second step, we propose to  use these data to pose queries that involve different nodes of the Web of linked ​ data.  ​ 
-  
- 
-* Contacts :  
-    * [[ezimanyi@ulb.ac.be|Esteban Zimányi]] (CoDE) 
-  
 =====Publishing and Using Spatio-temporal Data on the Semantic Web===== =====Publishing and Using Spatio-temporal Data on the Semantic Web=====
  
Line 252: Line 306:
 by application providers, that can build attractive and useful applications,​ in particular, for devices like mobile phones, tablets, etc.  by application providers, that can build attractive and useful applications,​ in particular, for devices like mobile phones, tablets, etc. 
  
-The goals of this thesis are: (i) study the existing proposals for mapping spatio-temporal data into LOD; (ii) apply this mapping to a real-world case study (as was the case for the [[http://​www.oscb.be/​|Open Semantic Cloud for Brussels]] project; (iii) Based on the produced mapping, and using existing applications like the [[http://​linkedgeodata.org/​|Linked Geo Data project]], build applications that make use of LOD for example, to find out which cultural events are taking place at a given time at a given location. ​  +The goals of this thesis are: (1) study the existing proposals for mapping spatio-temporal data into LOD; (2) apply this mapping to a real-world case study (as was the case for the [[http://​www.oscb.be/​|Open Semantic Cloud for Brussels]] project; (3) Based on the produced mapping, and using existing applications like the [[http://​linkedgeodata.org/​|Linked Geo Data project]], build applications that make use of LOD for example, to find out which cultural events are taking place at a given time at a given location. ​  
    
  
-    * Contact: [[ezimanyi@ulb.ac.be|Esteban Zimányi]] ​(CoDE)+    * Contact: [[ezimanyi@ulb.ac.be|Esteban Zimányi]]
  
 =====Extending SPARQL for Spatio-temporal Data Support===== =====Extending SPARQL for Spatio-temporal Data Support=====
Line 262: Line 316:
 Therefore, a proposal to extend SPARQL to support spatial data, called ​ [[http://​www.opengeospatial.org/​projects/​groups/​geosparqlswg/​|GeoSPARQL]],​ has been presented to the Open Geospatial Consortium.  ​ Therefore, a proposal to extend SPARQL to support spatial data, called ​ [[http://​www.opengeospatial.org/​projects/​groups/​geosparqlswg/​|GeoSPARQL]],​ has been presented to the Open Geospatial Consortium.  ​
    
-In this thesis we propose to (a) perform an analysis of the current proposal for GeoSARQL; (b) a study of  current implementations of SPARQL that support spatial data; (c) implement simple extensions for SPARQL to support spatial data, and use these language in real-world use cases. ​+In this thesis we propose to (1) perform an analysis of the current proposal for GeoSPARQL; (2) a study of  current implementations of SPARQL that support spatial data; (3) implement simple extensions for SPARQL to support spatial data, and use these language in real-world use cases. ​
    
  
-   * Contact: [[ezimanyi@ulb.ac.be|Esteban Zimányi]] ​(CoDE)+   * Contact: [[ezimanyi@ulb.ac.be|Esteban Zimányi]]
    
 
teaching/mfe/is.txt · Last modified: 2020/09/29 17:03 by mahmsakr