Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
teaching:mfe:is [2014/03/25 12:45]
svsummer [Models for programming Data Management in the Cloud]
teaching:mfe:is [2014/06/03 11:22]
svsummer
Line 15: Line 15:
  
 <​note>​Please note that this list of subjects is **not exhaustive. Interested students are invited to propose original subjects.**</​note> ​ <​note>​Please note that this list of subjects is **not exhaustive. Interested students are invited to propose original subjects.**</​note> ​
 +
 +===== Master Thesis in Collaboration with Euranova =====
 +
 +Our laboratory performs collaborative research with Euranova R&D (http://​euranova.eu/​). The list of subjects proposed for this year by Euranova can be found 
 +{{:​teaching:​mfe:​mt2014_euranova.pdf|here}}
 +
 +These subject include topics on distributed graph processing, processing big data using Map/Reduce, cloud computing, and social networks.
 +
 +  * Contact : [[ezimanyi@ulb.ac.be|Esteban Zimanyi]]
 +
 +===== Master Thesis in Collaboration with DPI 24/7 Media Publishing =====
 +
 +The goal of the thesis is to set up a Saas / Paas solution for the deployement of the dpi 24/7 media publishing distribution in a Heroku-like style.
 +
 +During this master thesis you will not only realize a theoretical and technological analysis of the problem of such a deployment but also implement a concrete solution for the dpi 24/7 distribution.
 +
 +From a technical point of view you will :
 +  * Develop a service using Docker and Dokku for the on-demand deployment of instances of the DPI 24/7 distribution (full stack architecture)
 +  * Realize performance tests of the developed service
 +  * Study the different options of the Paas mode (full stack or elastic deployment)
 +
 +Second, you will analyze the different existing solutions for the orchestration of an elastic virtualization architecture.
 +
 +Technology used by the DPI 24/7 distribution : Linux, Varnish, NginX, Php-fpm, Mysql (in background Tomcat, SOLR).
 +
 +Virtualization technology : Container virtualization and deployment with Dokku
 +
 +Virtualization architecture : Full-stack versus Elactic
 +
 +Performance test of the architecture
 +
 +Evaluation of orchestration container : Salt, Serf, Flynn, Fig...
 +
 +Interested? DIP 27/7 Contact [[ddu@audaxis.com|Dimitri Dujardin]]. Academic Supervisor [[svsummer@ulb.ac.be|Stijn Vansummeren]]
  
 ===== Automatic detection of name variations ===== ===== Automatic detection of name variations =====
Line 101: Line 135:
 Interested? Contact [[toon.calders@ulb.ac.be|Toon Calders]] Interested? Contact [[toon.calders@ulb.ac.be|Toon Calders]]
  
-===== Master Thesis in Collaboration with Euranova ===== 
  
-Our laboratory performs collaborative research with Euranova R&D (http://​euranova.eu/​). The list of subjects proposed for this year by Euranova can be found  +===== Design and Implementation ​of a Curriculum Revision Tool ======
-{{:​teaching:​mfe:​mt2014_euranova.pdf|here}}+
  
-These subject include topics on distributed graph processingprocessing big data using Map/Reduce, cloud computing, and social networks.+Stijn Vansummeren (WIT)Frédéric Robert (BEAMS)
  
-  ​* Contact : [[ezimanyi@ulb.ac.be|Esteban Zimanyi]]+This MFE concers the analysis, design, and implementation of a 
 +software system that can assist in the revision of teaching curricula 
 +(also known as teaching programs). 
 + 
 +The primary targetted functionalities of the  software system are as 
 +follows: 
 +  ​It should allow to make different versions of the teaching programs, much in the same way as version control systems like GIT and subversion offer the possibility to make different "​development branches"​ of a program'​s source code. 
 +  * It should ​ allow an extensible means to check the modified program for inconsistentcies. (For example, if course X has course Y as prerequisite,​ then course Y should not be scheduled in 2nd semester and X in 1st semester. Moreover, the total number of ECTS of all courses should be at most 60 ECTS. ) 
 +  * It should allow to analyze the modifications proposed in the teaching programs, and summarize the impact that these changes could have on other programs. (For example, if a course is removed from the computer science curriculum, it should be flagged that it should also be removed from all curricula that included the course.) 
 +  * It should load data from (and preferably, save data to) the ULB central administration database.  
 +  * It should give suggestions concerning the impact of the modifications on the course schedules. 
 + 
 +A proof-of-concept implementation of a revision tool that supports the first two requirements above is currently being developed in the context of a PROJH402 project. The MFE student that selects this topic is expected to: 
 + 
 +  * Develop this prototype to a production-ready implementation. 
 +  * Implement the communication with the central ULB database. 
 +  * Implement the impact analysis concerning the course schedules. 
 +  * Interact with the administration of the Ecole Polytechnique to fine-tune the above requirements;​ test the implementation;​ and integrate remarks after testing 
 + 
 +Contact : Stijn Vansummeren <​stijn.vansummeren@ulb.ac.be>,​ Frédéric Robert <​frrobert@ulb.ac.be>​ 
 + 
 +===== Design and Development of a Comprehensive DICOM validation application=====  
 + 
 +Using the new XML machine-readable format of the DICOM standard (in the form of docbook documents), the architecture of software tools and services for the automatic extraction and utilization of the full content of the DICOM standard will be defined and the corresponding software solutions will be developed. A comprehensive DICOM validation application will also be developed as a pilot project using the previously created DICOM standard digital services. 
 + 
 +References: <​http://​dicom.nema.org/;​ http://​www.oasis-open.org/​docbook/>​ 
 +Requirements:​ XML, XSL, database, Java or Python or C++. 
 + 
 +Contacts : Arnaud Schenkel <​arnaud.schenkel@ulb.ac.be>,​ David Wikler <​david.wikler@ulb.ac.be>,​ Stijn Vansummeren <​stijn.vansummeren@ulb.ac.be>
 ===== Structural compression of relational and semantic web databases ===== ===== Structural compression of relational and semantic web databases =====
 +
 +Stijn Vansummeren (WIT)
  
 Recent research in database management systems at ULB has shown how to Recent research in database management systems at ULB has shown how to
Line 126: Line 188:
   * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] ​     * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] ​  
  
 +
 +===== A contribution to Apache DRILL =====
 +
 +Google'​s research lab has produced a remarkable number of software
 +systems for the analytics of Big Data:
 +  * [[|Map/​Reduce]] for offline, batch-oriented data analysis over arbitrary datasets
 +  * [[http://​googleresearch.blogspot.be/​2009/​06/​large-scale-graph-computing-at-google.html|Pregel]] for offline analysis over graph-structured datasets
 +  * [[http://​research.google.com/​pubs/​pub36632.html|Dremel]] for on-line analysis over structured datasets
 +
 +For Map/Reduce and Pregel, the Apache Software foundation has
 +previously constructed open source implementations ([[http://​hadoop.apache.org/​|Hadoop]],​
 +[[https://​giraph.apache.org/​|Giraph]]). For Dremel, a project is
 +currently underway to provide an Open Source implementation (known as
 +[[http://​incubator.apache.org/​drill/​index.html|Apache Drill]]).
 +
 +The goal of this thesis is to (1) study the current architecture of Apache
 +Drill, (2) compare this with the state of the art in query processing
 +for structured datasets; (3) contribute to the development of the
 +Drill implementation.
 +
 +Students interested in this MFE are highly advised to follow the
 +course {{http://​cs.ulb.ac.be/​public/​teaching/​infoh417|INFOH417
 +Database Systems Architecture}} for a background on query processing
 +in traditional database management systems.
 +
 +  * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] ​  
 ===== Aspects of Text Analytics and Information Extraction ===== ===== Aspects of Text Analytics and Information Extraction =====
  
Line 208: Line 296:
  
                                                                                                                                        
- 
-=====Foundations of Data Description Languages===== 
- 
-Recently, several small "​domain specific languages"​ have been proposed 
-to facilitate programming with ad hoc data (including PADS, 
-DATASCRIPT,​PACKETTYPES,​ Microsoft M Grammar). Ad hoc data is data 
-other than data in well-behaved relational or XML formats. 
- 
-The above languages take as input a description of the data format to 
-be dealt with, and automatically generate a large number of software 
-tools (parsers, serializers,​ data transformers,​ error recognition,​ 
-...) to process the ad-hoc data. 
- 
-The goal of this thesis is to study the programming language-theory 
-foundations behind these languages, their commonalities and their 
-differences. If possible, suggestions for further extensions to the 
-languages should be formulated. 
- 
-  * References : 
-      * http://​datascript.sourceforge.net/​ 
-      * http://​www.padsproj.org/​index.html 
- 
-\\ 
-  * Contact : [[stijn.vansummeren@ulb.ac.be|Stijn Vansummeren]] 
  
  
 
teaching/mfe/is.txt · Last modified: 2020/09/29 17:03 by mahmsakr