Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
teaching:mfe:ia [2020/04/16 09:07]
bersini [Etude de l'algorithme du Deep Learning]
teaching:mfe:ia [2022/11/30 13:34] (current)
stuetzle [Text Categorisation and quality control through automatic language processing]
Line 1: Line 1:
-====== MFE 2018-2019 : Intelligence Artificielle ======+====== MFE 2022-2023 : Intelligence Artificielle ======
  
 ===== Introduction ===== ===== Introduction =====
Line 243: Line 243:
  
  
-===== Text Categorisation and quality control ​through ​automatic ​language processing ===== +===== Automated summaries of long or multiple texts through ​automated ​language processing =====
- +
-This thesis is developed in collaboration with the Energy Efficiency in Industrial Processes (EEIP) company. EEIP is a global industry information network. As part of their activities, they disseminate case studies to various network groups. The goal of the project is to develop an automatic language processing algorithm capable to evaluate the quality (accept / reject) of the proposed case studies and to allocate them to single/​multiple categories. Testing and training the algorithm is a key part as it not only requires development and testing of concepts such as how to evaluate quality or definition of requirements for multiple category allocation but the project also has +
-to be developed in a limited data environment (+/- 1000 case studies as training set). +
- +
-Required skills: A background in machine learning would be helpful.+
  
 +This thesis is developed in collaboration with the Energy Efficiency in Industrial Processes (EEIP) company. EEIP is a global industry information network. As part of their activities, they disseminate articles, reports and case studies to their global network of 150.000 business professionals. EEIP has already implemented an ALP algorithm (Bidirectional and Auto-Regressive Transformer (BART)) to summarize articles with a length of max. 1500 words. This solution is the result of a former thesis which was completed in 2021.
 + 
 +The main goal of this project is to develop an automatic language processing algorithm and process capable of summarizing long text (e.g. reports, 25-100 pages long) and multiple texts into a single summaries (e.g. 3 articles dealing with implementation of smart pump systems in industry).
 + 
 +Testing and training the algorithm is a key part, during the development (thesis-) phase but also after being in operation to improve the quality based on manual feedback via corrected summaries. A specific challenge is represented by the limited data environment (+/- 1000 case studies as training set), likely requiring using external test data sets during development.
 + 
 +A possible extension could be the pre-selection of external content (articles, case studies and reports) by analysing its relevance for EEIP based on fit with the thematic categories EEIP is using to represent the energy transition. This could be based on categorization capabilities of the new ALP algorithm or in conjunction with the algorithm used in EEIP’s recommendation engine.
  
   * Contacts :    * Contacts : 
 
teaching/mfe/ia.1587020845.txt.gz · Last modified: 2020/04/16 09:07 by bersini