Fifth European Business Intelligence Summer School (eBISS 2015)

Doctoral Colloquium, IT4BI-DC, PhD presenters

Second year student schedule

To take place on Thursday at 14:30 (see Program)

Track 1 (Room: PMT A021) Track 2 (Room: PMT A022)
14:30 Dilshod Ibragimov: OLAP over distributed RDF sources
(Esteban Zimányi, Oscar Romero, Hannes Voigt, Victoria Nebot)
Bijay Neupane: Intelligence detection and prediction of energy at the device level
(Torben Bach Pedersen, Martin Hahmann, Toon Calders, Panos Vassiliadis)
15:30 Vasileios Theodorou: Automating User-Centered Design of Data-Intensive Processes
(Alberto Abelló, Martin Hahmann, Esteban Zimányi, Panos Vassiliadis)
Jovan Varga: Discovering Analytical Concepts from User Profiles
(Oscar Romero, Torben Bach Pedersen, Robert Wrembel, Victoria Nebot)
16:30 COFFEE BREAK
17:00 Nurefsan Gur: Business Intelligence over Linked Open Spatio-Temporal Data
(Torben Bach Pedersen, Esteban Zimányi, Hannes Voigt, Alejandro Vaisman)
Azadeh Nasiri: Requirements Engineering for Big Data Predictive Analytics
(Robert Wrembel, Martin Hahmann, Alberto Abelló, Rafael Berlanga)
18:00 Waqas Ahmed: Modeling Data Warehouses with Multiversion and Temporal Functionality
(Esteban Zimányi, Hannes Voigt, Torben Bach Pedersen, Alejandro Vaisman)
Kasun Parera: Model-Based Database Systems
(Martin Hahmann, Alberto Abelló, Robert Wrembel, Rafael Berlanga)

Second year students


  • Kasun Perera Photo

    Kasun Perera

    Home university: Technische Universität Dresden (TUD)
    Advisor: Wolfgang Lehner
    Host university: Aalborg Universitet (AAU)
    Co-advisor: Torben Bach Pedersen

    Presentation: Model-Based Database Systems
    Today, information for decision making processes is derived from stored data in information systems in business organizations. Accessing and mining business intelligence from stored data is crucial and this process should be fast and accurate to support on-time decision making processes. Provided that current information systems store more and more data (ranging from few GBs to several PBs), it is necessary to have efficient and effective access mechanisms. Thus, it is vital to develop new methods for faster information access, as current methods do not support efficient access to derive information. One drawback of these methods is, they access raw data to produce results. In this research, we focus on developing a new layer of access mechanism over data to support information retrieval without accessing raw data. Our system supports approximate query results for decision support systems where, faster approximate results are more valuable than slower exact results. The system stores 'models' to represent the underlying data and user queries are answered by accessing the models rather than raw data directly. This mechanism provides faster access to information and also provides complete view of data at any point in time while hiding the incomplete and erroneous data. The model-based information access is expected to be scalable when data grows since models need low memory footprint and provide efficient querying on large data. We propose a model pool where we store models to represent multiple datasets depending on their similarity on original data. Thus we have a compact representation of the underlying data.
    In this project, we will examine different types of models for different types of data and propose a model designer that determines a good set of candidate models for a given data set and query workload. Users can access information through models by SQL like declarative queries that are transparently processed by the DBMS. Combining the optimal number of models to answer the query is one of the important goals in this project. We will extensively evaluate the proposed approaches on synthetic as well as real data sets. The outcomes of this project can greatly be used in big data analytics such as market research, energy domain and computer game analytics. Compressed representation and querying over the compressed model will significantly increase the throughput of query even when dealing with data in terabyte scale. We are using systematic methodology to conduct this research project and we perform a thorough literature review to identify limitations in current systems. Then we formulate research problem followed by data acquisition. We conduct feasibility analysis on our proposed solutions to select best possible approaches and implement selected approaches. Prototype implementations are evaluated to further enhance it before integrating to an existing database system. We plan to integrate our proposed system into a current database system, potentially PostgreSQL as it is widely used open source system.

    Email:   kasun.perera@tu-dresden.de
    Web:   https://it4bi-dc.ulb.ac.be/user/44

    Download slides


    Bijay Neupane Photo

    Bijay Neupane

    Home university: Aalborg Universitet (AAU)
    Advisor: Torben Bach Pedersen
    Host university: Technische Universität Dresden (TUD)
    Co-advisor: Wolfgang Lehner

    Presentation: Intelligence detection and prediction of energy at the device level
    Renewable energy sources (RES) are increasingly becoming an important component in the power grids. If we consider the Nordic region, a major portion of the electricity demands is fulfilled by RES, with wind power accounting for 6% of the total power generated. Further, in 2014, wind power contributed 39% of total electricity demand in Denmark. This high dependence on weather conditions creates huge challenges in demand management. This Ph.D. contributes to the vision of the TotalFlex project, of utilizing the flex-offer1 as a cheaper and environmental solution for dynamic demand management. The flex-offer framework addresses the challenges of demand management by communicating a negotiation in the form of flexible consumption (or production) offers. In its simplest form, a flex-offer specifies the amount of energy required the duration, earliest start time, latest end time, and a price. For example,” I need 2KW to operate a dishwasher over 2 hours between 11 PM to 8 AM and I will pay 0.40DKK/KWh for it".
    The introduction of the flex-offer concept and the requirement of designing a mechanism for demand forecast and flexibility forecast to support the concept, has led to the research issue in which this Ph.D. project targets. More specifically, it deals with the issue regarding accurate and precise demand forecast, flexibility-detection and -forecast at the device level, and use the forecast for automated generation of flex-offers. Here, the flexibility is defined as “the amount of energy and the duration of time to which the device energy profile and/or activation can be changed”. Through the experiment, we have shown the potential of extracting the flexibility from the usage of devices in a household. Further, this Ph.D. project aims to utilize these findings to design state-of-the-art machine learning methods for demand- and flexibility-forecast. The outcome of the research will provide substantial information to the energy market actors for dynamic demand management, and enables mutual benefits. In addition, we have demonstrated a positive financial impact of utilizing flexibility on the energy market. The experimental results, on the DK1 (West Denmark) regulating power market, show that the market can achieve up to 49% reduction in the average regulation cost and 29.4% reduction in regulation volume, with just 3.87% of average gross demand (2.58 GW) being flexible.

    Email:   bn21@cs.aau.dk
    Web:   https://it4bi-dc.ulb.ac.be/user/39

    Download slides


    Vasileios Theodorou Photo

    Vasileios Theodorou

    Home university: Universitat Politecnica de Catalunya (UPC)
    Advisor: Alberto Abellо
    Host university: Technische Universität Dresden (TUD)
    Co-advisor: Wolfgang Lehner

    Presentation: Automating User-Centered Design of Data-Intensive Processes
    Business Intelligence (BI) enables an organization to collect and analyse internal and external business data to generate knowledge and business value, and provide decision support at the strategic, tactical, and operational levels. Typically, enterprises rely on complex Information Technology (IT) systems that manage all data coming from operational databases running within the organization and provide fixed user interfaces through which knowledge workers can access information for analysis and assessment. The consolidation of data coming from many sources as a result of managerial and operational business processes is itself a statically defined business process and knowledge workers have little to no control over the characteristics of the presentable data to which they have access.
    There are two main reasons that dictate the reassessment of this stiff approach in context of modern business environments. The first reason is that the service-oriented nature of today’s business combined with the increasing volume of available data make it impossible for an organization to pro-actively design efficient data management processes that are specific to its internal operational scope. Thus, the existence of diversely structured heterogeneous data deriving from multiple internal and external sources suggest the definition of dynamic models and processes for its collection, cleaning, transformation, integration, analysis, monitoring and so on.
    The second reason is that enterprises can benefit significantly from analysing the behaviour of their business processes fostering their optimization. Such analysis is conducted by knowledge workers and business analysts who lack knowledge about underlying infrastructure of IT systems and related technologies. In this respect, they should be provided with dynamic user-centered tools that can facilitate ad hoc processing of analytical queries with minimal IT intervention.
    Our research work aims at defining models, techniques and tools to support the alignment of user requirements with the runtime characteristics of business processes for data management. Thus, the first step has been the study of conceptual models for process modelling in context of data augmentation and data warehousing. After obtaining a deep understanding of data-intensive processes and their quality features, we have defined and published models that describe data utility and other quality attributes as a result of operations within the BI-related business processes. These models facilitate reasoning about and evaluation of alternative business processes through quantitative analysis of the processes using concrete metrics that we have also defined.
    The ultimate goal of this project is the development of a user-centered framework that automates process model enhancement and selection for BI-related data processing. Some of the key requirements for the design of this framework are usability, efficiency and effectiveness. In this direction, we have designed and published our proposed architecture for this framework. In addition, a prototype for automatic quality-aware ETL process redesign has been implemented and presented as a demo.
    Future steps include the enrichment and evaluation of our framework, considering its accuracy, usability and completeness.

    Email:   vasileios@essi.upc.edu
    Web:   https://it4bi-dc.ulb.ac.be/user/50

    Download slides


    Dilshod Ibragimov Photo

    Dilshod Ibragimov

    Home university: Universite Libre de Bruxelles (ULB)
    Advisor: Esteban Zimányi
    Host university: Aalborg Universitet (AAU)
    Co-advisor: Torben Bach Pedersen

    Presentation: OLAP over Federated RDF Sources
    Business Intelligence (BI) tools provide fundamental support for analyzing large volumes of information. Data Warehouses and Online Analytical Processing (OLAP) tools are used to store and analyze data. Traditionally, such analyses were performed in a “closed-world” scenario, based only on internal data. With the advent of the web, more and more useful data became available online. Nowadays much information is available in the form of Resource Description Framework (RDF) via SPARQL endpoints. With the recent SPARQL 1.1 standard, the RDF datasets can be queried in novel and more powerful ways, e.g., complex analysis tasks involving grouping and aggregation, and even data from multiple SPARQL endpoints, can now be formulated in a single query. Thus, recent advances in Semantic Web enable BI applications to access many different datasets from multiple web sources, which may help BI tools achieve better results by integrating real-time data from web sources into the analysis process.
    The incorporation of the external data into the decision making process is a challenging task. As data on the web may be highly volatile, incorporating these data into the existing Extract-Transform-Load life-cycle is not feasible. Therefore, it is suggested to design a new data warehousing approach that may retrieve data from RDF sources on the fly and analyze these data without storing them in the system. The approach proposed for this case defines a multidimensional schema using RDF vocabularies and will contain information about discovered RDF data sources. Based on this information the system will be able to query data sources, extract and aggregate data, build a cube, and answer user queries.
    Optimizing the execution of the queries for OLAP scenarios is another challenge. As the amount of available data increases, the requirements for query processing also evolve. With much data to process, analytical queries need special techniques to improve the performance of user queries. As both aggregate and federated queries have become available only recently, state-of-the-art systems lack sophisticated optimization techniques that facilitate efficient execution of such queries over large distributed datasets.
    To overcome these shortcomings, we propose a set of query processing strategies for executing aggregate SPARQL queries over federations of SPARQL endpoints. We also propose a solution to the problem of executing aggregate queries over a large sets of data by rewriting the queries using materialized views. We specify a cost model for RDF views selection, identify the syntax for defining a view, and present a query rewriting algorithm.
    Our experiments show that the suggested optimization techniques significantly improve performance over current state-of-the-art systems.

    Email:   dibragim@ulb.ac.be
    Web:   https://it4bi-dc.ulb.ac.be/user/31

    Download slides


    Jovan Varga Photo

    Jovan Varga

    Home university: Universitat Politecnica de Catalunya (UPC)
    Advisor: Oscar Romero
    Host university: Aalborg Universitet (AAU)
    Co-advisor: Torben Bach Pedersen

    Presentation: Discovering Analytical Concepts from User Profiles
    Decision-making in the era of information society is based on the analysis of available data resources. There is an increasing number of publicly available data resources that represent a wealth of information to be explored. However, data exploration in these settings is often a tedious task due to the need for non-trivial technical skills (e.g., use of certain querying languages). Non-expert users need the assistance to navigate through this data landscape to perform their analysis.
    Traditional BI systems provide different user support functionalities, typically for querying and data visualization. These features are based on the exploitation of metadata artifacts (e.g., queries). The metadata are the fuel for different user assistance (e.g., query recommendation) algorithms and they directly determine the assistance possibilities. However, their management and organization are typically overlooked.
    The aim of this doctoral thesis is to provide a metadata foundation for the user assistance features for next generation BI systems. Our claim is that the metadata need to be considered and handled as a first-class citizen. Moreover, in the novel settings the user wants to analyze data coming from external and non-controlled data sources. Therefore, the metadata need to be designed in a flexible and reusable manner to be utilized in the context of these new and heterogeneous data sources. In this direction we defined the Analytical Metadata (AM) framework that is based on a survey. The framework covers the metadata artifacts needed for the user assistance in the BI area and elaborates on the related metadata processing types. Furthermore, we defined SM4AM, a Semantic Metamodel for Analytical Metadata that is an RDF-based formalization of AM. The metamodel level was chosen to overcome the heterogeneity between systems as different system-specific models can be created by instantiating the metamodel. Moreover, the use of RDF provides flexibility and reuse potential. In this context, RDF led us to the Linked Data initiative that brings many new data available from various source, e.g., governmental institutions. Sources belonging to the Linked Data are typically hard for exploration and they can also significantly benefit from the use of AM for the user assistance. Initially, SM4AM was focused on the user-related metadata artifacts and it was just recently updated to provide much wider support for the metadata related to the system and data itself (e.g., traceability metadata and similar). Complementary with these efforts, we also worked on the schema enrichment of certain RDF data sources in order to enable the multidimensional analysis, characteristic for the BI field. We focused on the sources that already contain some semantics in this direction. As we will show, all these efforts by now represent a foundation for user-centric BI systems. They open new research perspectives for metadata modeling and exploitation intended for user assistance activities. Our next step is the use of AM and SM4AM for the query recommendation area as a proof of concept for our claims. We will show that metadata are the neglected and unexploited treasure for the user-centric BI systems.

    Email:   jvarga@essi.upc.edu
    Web:   https://it4bi-dc.ulb.ac.be/user/43

    Download slides


    Nurefsan Gur Photo

    Nurefsan Gur

    Home university: Aalborg Universitet (AAU)
    Advisor: Torben Bach Pedersen
    Host university: Universite Libre de Bruxelles (ULB)
    Co-advisor: Esteban Zimányi

    Presentation: Business Intelligence Over Linked Open Spatio-Temporal Data
    Business Intelligence (BI) refers to providing meaningful and useful information for decision making tasks by gathering, transforming, analyzing and manipulating data from existing sources. This process is most widely handled in Data Warehouses (DW) which are a large repository of integrated data from several sources. Multidimensional (MD) data modeling and Online Analytical Processing (OLAP) technologies are commonly used in data warehousing. The Semantic Web (SW) has drawn the attention of data enthusiasts, and also inspired the exploitation and design of multidimensional data warehouses in an unconventional way. Traditional data warehouses operate over static data. SW represents the web of data in a machine readable way which allows capturing semantics and supports inference and reasoning on the data dynamically. The overall thesis targets to employ OLAP and data warehousing on Semantic Web by fruitfully exploiting spatio-temporal linked open data in RDF warehouses for organizational and enterprise purposes. The current state of the art of the research presents dynamically extending multidimensional data (cube) modeling approach by defining both schema and the instances of multidimensional data as RDF graphs. The focus is on geospatial data type and spatial operations with topological relationships which are specific for this kind of data. The research emphasizes on how to utilize spatial dimension of the cube with RDF vocabulary and OLAP operations. We implemented spatial and metric analysis on spatial members along with traditional OLAP operations. Use case data and the framework is demonstrated with a set of spatial OLAP classes and query examples. The research report is concluded with future directions on how advanced members and operations of spatial data type can be integrated with the current developed framework. Finally, we also elaborate on consolidation and possible case studies of the temporal dimension to fit in our framework, in order to employ spatio-temporal data in RDF warehouses to give a complete picture of the research thesis.

    Email:   nurefsan@cs.aau.dk
    Web:   https://it4bi-dc.ulb.ac.be/user/46

    Download slides


    Azadeh Nasiri Photo

    Azadeh Nasiri

    Home university: Poznan University of Technology (PUT)
    Advisor: Robert Wrembel
    Host university: Universite Libre de Bruxelles (ULB)
    Co-advisor: Esteban Zimányi

    Presentation: Requirements Engineering for Predictive Analytics
    Predictive Analytics market has woken up. Although largely unseen, it drives millions of decisions. It is compelled to grow and pushed to the mainstream as a decision making tool. Despite the importance of predictive analytics, the overall use of this analytics is relatively small, only 13% of organizations use it. The barriers to adapt predictive analytics have nothing to do with data access and technology; the lack of understanding of how to use this analytics to drive decision-making is the main reason. This means that companies have vague notion about the business areas or applications that can benefit from predictive analytics.
    Business Intelligence (BI) is the best context to discuss empowering companies with the capability of prediction in their decision-making process. One of the main components of BI systems is a DW, which integrates data from different data sources and structures them to be used for analytics. Structuring data to store in a DW in the form of multidimensional (MD) model refers to the static part of a DW. Analytics conducted over the data stored in a DW referred as the dynamic part of a DW. Analytics can range from descriptive analytics to predictive analytics. The problem addressed by this study is that, BI solutions with the capability of predictive analytics are hard to develop because companies have difficulty to figure out where and what are the goals and applications of this analytics.
    The problem explained in the previous paragraph, has been traced in the early phase of BI projects, when the requirements of a DW repository are captured. Requirements Engineering (RE) for DWs discusses what data in which form is of particular interest for decision makers to store in a DW and analyse how users interact with this data to generate knowledge and value. This definition of RE for DWs shows clearly how exploring the RE context can contribute to deal with the challenge addressed in this study. The overall objective is to define a RE method for DWs consisting of certain phases and defined activities that transform business questions to a predictive insight delivered by an intended BI solution. To develop a RE method for DWs by which a predictive insight is delivered, the main steps are summarized as follows:
    1. Reviewing the literature on theoretical concepts of RE, as well as DWs regarding both the static and the dynamic part.
    2. Reviewing the current RE methods developed for DWs.
    3. Developing a RE method for DWs covering both the static part of DWs, from where the MD model is obtained, as well as the dynamic part of the DW, where the functional requirements of the predictive analytics on the data stored are captured. This step includes the following sub-steps:
    3.1 Developing a model-based RE method for DWs covering the static part, from where the information requirements of a BI system is captured (we offer the goal-oriented approach for this part)
    3.2 Extending the proposed method to cover the dynamic part from where the functional requirements of a descriptive analytics over the data stored in a DW is captured (we extend the RE techniques of the goal-oriented approach with techniques of the object-oriented approach for this part)
    3.3 Extending the proposed method further to capture the functional requirements of a predictive analytics, where the requirements of the descriptive analytics are transformed to gain predictive insight via a BI system (we offer the object-oriented approach for this part).
    4. Implementing the proposed method to a case study to validate it in practice.

    Email:   nasiri.azadeh@gmail.com

    Download slides


    Waqas Ahmed Photo

    Waqas Ahmed

    Home university: Universite Libre de Bruxelles (ULB)
    Advisor: Esteban Zimányi
    Host university: Poznan University of Technology (PUT)
    Co-advisor: Robert Wrembel

    Presentation: Modeling Data warehouses with Multiversion and Temporal Functionality
    A data warehouse (DW) integrates data from multiple, external, and often heterogeneous data sources. In practice, these data sources are not static and change their content and structures. The changes need to be reflected in a DW. Moreover, changes in a real world being modeled in a DW and prediction scenarios often require creation of multiple DW states. To this end, temporal data warehouses allow to manage changes in their content, but they cannot handle changes in their structures (schemas). On the contrary, schema changes can be handled by means of schema versioning in the so-called multiversion data warehouses (MVDWs). The approaches to MVDWs proposed so far are complex to develop and they possess limited capabilities of querying multiple data warehouse versions.
    This research project aims at developing a framework for coupling temporal mechanisms, for managing temporal versions of data, and versioning mechanisms, for managing versions of a data warehouse. In particular, the following research problems will be addressed: (1) developing mechanisms for querying multiple and heterogeneous data warehouse versions, (2) developing efficient storage for DW versions, (3) benchmarking performance of the solutions. As a proof of concept, a prototype software will be developed.
    Maintaining the history of changes in the content and structure is an important, yet a challenging issue in the field of data management, which comes from practice. Typical area where this phenomenon is observed is sales - product change prices, are assigned to new categories, customers change their buying habits over time. The temporal and multiversion features allow the user to accurately re-create the state of the modeled reality in the past or simulate multiple future scenarios. These retrospective or prospective states can help the decision makers while investigating an anomaly or doing a business performance analysis. Thus, the outcome of this project will deliver a technology that will solve the real world problems.
    Our solution is based on the concept of a mutliversion data warehouse. So far, we proposed a model capable of storing multiple schema versions. The MVDW is composed of a set of DW versions. A unique feature of this concept is that a DW version allows to store multiple temporal versions of data conforming to the same schema. This way we will be able to run inter-version queries to query temporal data versions in a given DW version or intra-version queries to query multiple DW versions at once. Further, we have already proposed an architecture for inter-schema querying system. Moreover, we have presented a classification of temporal queries.

    Email:   waqas.ahmed@ulb.ac.be
    Web:   https://it4bi-dc.ulb.ac.be/user/46

    Download slides


News

22
The web site for the summer school was launched.

Sponsors