Tenth European Big Data Management & Analytics Summer School (eBISS 2022)

Invited Speakers & Tutors


  • Sonia Bergamaschi

    Sonia Bergamaschi

    University of Modena and Reggio Emilia, Italy

    Sonia Bergamaschi is full Professor of "Big Data Management & Analysis” at the Engineering Department" Enzo Ferrari "in Modena and leads the database research group (www.dbgroup.unimore.it). Her research activity was developed mainly in the area of Knowledge representation, management and integration. She has published over 200 articles in international journals and conferences, promoted the start-up "DATARIVER – www.datariver.it", founded in 2009 with the aim of engineering and distributing the data integration system MOMIS. Since then she has been acting as scientific director of the DATARIVER company which employs 6 full-time staff, 2 consultants for activities related to the theme of E-Ealth, in addition to the five founding members. She has been appointed on 2018 ACM distinguish researcher.

    Email: sonia.bergamaschi@unimore.it
    Web: http://personale.unimore.it/rubrica/dettaglio/sonia

    Lecture: Data Science for E-Health
    Slides: Part 1/1
    Data Integration (DI) aims to provide unified access to data residing in multiple autonomous data sources. Record Linkage (aka Entity Resolution (ER)) is a key ingredient in DI, aiming at linking records representing the same real world entity into one or more data source. Since there is no unique identifier of the entities to be connected and shared between the different data sources, Record Linkage requires sophisticated and computationally expensive comparisons among sets of attributes to calculate the similarity between the pairs of records. Often these sets of attributes contain personal information (examples of these are first and last names, addresses, telephone numbers, or dates of birth), thus violating privacy and confidentiality. Privacy-Preserving Record Linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity without revealing any sensitive information about these entities. For over twenty years, the DBGroup of the University of Modena and Reggio Emilia has designed and developed the MOMIS DI system, distributed as open source by the founded company, DataRiver (www.datariver.it) In recent years the DBGroup designed innovative highly scalable ER techniques and contributed to the development of the open source platform for ER “JedAI” and now started to face the PPRL problem.


  • Angela Bonifati

    Angela Bonifati

    Lyon 1 University, France

    Angela Bonifati is a Professor of Computer Science at Lyon 1 University and at the CNRS Liris research lab, where she leads the Database Group. In 2019 and 2020, she was on leave at INRIA. Prior to that, she was working as a Professor at Lille 1 University (2011-2015) and as a researcher at CNR, Italy until 2011. She received her Ph.D. from Politecnico di Milano in 2002. Her current research interests are on the interplay between relational and graph-oriented data paradigms, particularly query processing, data integration and learning for both structured and unstructured data models. She is involved in several grants at Lyon 1 University, including French, EU H2020 and industrial grants. She has also co-authored more than 150 publications in top venues of the data management field along with two books (edited by Springer in 2011 and Morgan Claypool in 2018) and an invited paper in ACM Sigmod Record 2018. She is the Program Chair of ACM Sigmod 2022 and an Associate Editor for both Proceedings of VLDB and IEEE ICDE. She is an Associate Editor for the VLDB Journal, ACM TODS, Distributed and Parallel Databases and Frontiers in Big Data. She is currently the President of the EDBT Executive Board and a member of the ICDT council. She holds many visiting scholar positions in foreign universities in both Europe and North America. Since 2020, she is also Adjunct Professor at the University of Waterloo in Canada.

    Email: angela.bonifati@univ-lyon1.fr
    Web: https://perso.liris.cnrs.fr/angela.bonifati/

  • Chao Zhang

    Chao Zhang

    Lyon 1 University, France

    Chao Zhang is a postdoc with the CNRS Liris research lab at Lyon 1 University. He received his Ph.D. in computer science from the University of Clermont Auvergne, France in 2019. His research interests include graph query processing, stream processing, and query rewriting. He is a program committee member of VLDB 2023, and SIGMOD 2022.

    Email: chao.zhang@univ-lyon1.fr
    Web: https://liris.cnrs.fr/page-membre/chao-zhang

    Lecture: Big Graph Processing Systems
    Slides: Part 1/2 Part 2/2
    Graphs are data model abstractions that are becoming pervasive in several real-life applications and use cases. In these settings, users primarily focus on entities and their relationships, further enhanced with multiple labels and properties to form the so-called property graphs. Modern big graph processing systems need to keep pace with the increasing fundamental requirements of these applications and to tackle unforeseen challenges. Motivated by our community-wide vision on future graph processing systems, in this talk I will present the system challenges that are lying behind big graph processing and analytics research areas. Many current graph query engines only support subsets of graph queries that they can efficiently evaluate, thus disregarding more expressive query fragments on top of property graphs. It becomes crucial to address efficient query evaluation for complex graph queries, as well the extensibility of the underlying graph query and constraint languages and the support of property graph schemas. Moreover, the dynamic aspects of query evaluation on streaming graphs are equally important components of big graph ecosystems and require design and benchmarking efforts. During the talk, I intend to touch upon our work on these topics and to pinpoint the research directions and open problems for big graph processing systems.


  • Pedro Delicado

    Pedro Delicado

    Universitat Politècnica de Catalunya, Spain

    Pedro Delicado is full professor of Statistics at the Universitat Politècnica de Catalunya Barcelona-TECH. His research activity has been mainly devoted to Functional Data Analyses (focusing on dimensionality reduction and spatial dependence), but in recent years he is interested in exploring links between Statistics and Machine Learning, with particular interest in predictive models’ interpretability. He has supervised 6 PhD thesis. Moreover, he has been the principal researcher of 7 public funded research projects and 5 research projects with private companies. He also has held short visiting positions at the University of California at Davis and at the University of Toulouse.

    Email: pedro.delicado@upc.es
    Web: http://www-eio.upc.es/~delicado/

    Lecture: Interpretability and Explainability in Machine Learning
    Slides: Part 1/3 Part 2/3 Part 3/3
    Machine learning models are increasingly accurate in their predictions and, therefore, their presence has multiplied in many facets of our lives. Many times, the improvements in predictive efficiency of the models are achieved at the cost of increasing their complexity. Artificial neural networks are a good example of this evolution: its history begins with the perceptron (mid-20th century), goes through the multilayer perceptron (late 20th) and reaches today's deep networks, whose complexity often leads us to refer to them as “black boxes”. The growth in ubiquity and complexity of machine learning algorithms leads to more and more voices claiming to understand how and why these algorithms make their decisions. In response to this demand, in recent years a whole literature has appeared (known as "Interpretable Machine Learning" or "eXplainable Artificial Intelligence", IML or XAI) whose purpose is to provide transparency and interpretability to automatic algorithms in order to gain the trust of potential users. In this short course we will introduce some of the current IML tools, describing how to use them in practice through example


  • Michele Lombardi

    Michele Lombardi

    University of Bologna, Italy

    Michele Lombardi is a fixed-term Assistant Professor at the DISI department of the University of Bologna, working on Combinatoral Optimization and Decision Support Systems. In particular, his research activity is focused on hybrid optimization methods, based on heterogeneous techniques such as Constraint Programming, (Mixed) Integer Linear (and Non-Linear) Programming, and Machine Learning. His main application fields are Resource allocation and Scheduling problems, Cyclic Scheduling (e.g for control system design), and Scheduling problems in the presence of Uncertainty. More recently, he has started to work with prof. Michela Milano on a methodology to solve optimization problems over complex system by embedding Machine Learning models withing optimization models: they called it "Empirical Model Learning".

    Email: michele.lombardi2@unibo.it
    Web: http://ai.unibo.it/people/MicheleLombardi
    Slides & Code: GitHub

    Lecture: Methods for Constrained Machine Learning
    Machine Learning methods excel at extracting implicit knowledge from data so as to obtain models that can estimate unknown quantities (classes, forecasts, properties that are hard to measure). In many practical scenarios, however, substantial knowledge about a system of interested is available to domain experts in symbolic form (e.g. rules, differential equations). Such knowledge may in principle be used to compensate for the lack of abundant data, to guarantee safety properties, or to counter discriminations effects in applications having a social impact. The term "Constrained Machine Learning" refers to a class of techniques that attempt to take advantage of such symbolic information in the form of constraints. By making it possible to inject constraints in ML model at training or inference time, such approaches allow one to take advantage of both implicit and explicit problem knowledge. This lecture will survey a selection of techniques for constrained ML, with practical demonstrations on simplified industrial use cases.


  • Alkis Simitsis

    Alkis Simitsis

    Athena Research Center

    Dr. Alkis Simitsis is a Research Director at Athena Research Center and an IEEE Senior Member. In the past, he held various positions with HP/HPE Labs, Micro Focus, Unravel Data, and IBM Research, including Chief Scientist, Systems Architect, and Principal Research Scientist. Alkis brings 18+ years of critical experience in both startup and corporate environments, building innovative information and data management solutions and enterprise-grade products in areas such as scalable big data infrastructure, data-intensive analytics, information management, business intelligence, massively parallel processing, distributed databases, column-store databases, graph databases, security analytics, and cloud computing. Alkis holds 42 U.S. patents and has filed 50+ patent applications in the U.S. and worldwide, has published 110+ papers in refereed international journals and conferences (top publications cited 6300+ times, h-index: 42), and frequently serves in various roles in program committees of top-tier international scientific conferences. He is the recipient of several recognitions and awards, such as 2021 ACM SIGMOD Distinguished PC, 2020 Best ACM CIKM Demo Award, 2012 and 2014 Best ACM SIGMOD Demo Award, the 1st prize of diploma thesis by the Greek Technical Chamber (2000), the Thomaides' prize for the progress of science and art (2001, 2003), and awards from the European Union and the Greek Ministry of Education (2005) and the Institute of Communication and Computer Systems (2001, 2002).

    Email: alkis@athenarc.gr
    Web: https://web.imsi.athenarc.gr/~alkis/

    Lecture: Big Data Infrastructure
    Slides: Part 1/1
    Organizations worldwide invest heavily in technologies for generating information and insights from big data. Typically, they employ a big data stack composed of multiple distributed systems deployed across on-premises data centers, private and public cloud deployments, or hybrid combinations of these. In this talk, we will describe modern approaches for handling big data from a systems' perspective, including traditional and learning techniques for processing and optimization of programs spanning multiple execution and storage platforms, workload management challenges in hybrid environments, and performance management requirements in the big data stack. We will also present challenges and solutions for the developer (e.g., data scientist) at the application layer.


  • Massimiliano De Leoni

    Massimiliano De Leoni

    University of Padua, Italy

    Massimiliano de Leoni is an Associate Professor of Computer Science at the University of Padua, Italy. He obtained a Ph.D. in Computer Engineering in 2009 at SAPIENZA – University of Rome on a thesis on Business Process Flexibility and Adaptation. In 2010 he started as postdoctoral researcher at Eindhoven University of Technology, where he was appointed as an Assistant Professor in 2014. He is with University of Padua since 2019. His research areas include Business Process Management and Modelling, Process Mining, Business Intelligence, Process-aware Information Systems, and Business Process Simulation. He was a chair of the Demonstration Track of the 16th International Conference in Business Process Management (BPM 2018) and of the 26th Internation Conference on Enterprise Design, Operations and Computin (EDOC 2022), chair of the Industry Forum of the 19th International Conference in Business Process Management (BPM 2021), general chair of the Second International Conference on Process Mining (ICPM 2020). He is a member of the IEEE Task Force on Process Mining.

    Email: deleoni@math.unipd.it
    Web: https://www.math.unipd.it/~deleoni/

    Lecture: Process Mining and Improvement
    Slides: Part 1/1
    Process Mining sits between the disciplines of Data Science and Process Management. The goal of Process Mining is to discover, monitor and improve process by extracting knowledge from event logs, the transaction data that can be extracted from today's (information) systems. The benefits of applying Process Mining lay on that the analysis is based on event-log data, which returns insights into how processes are really executed. This allows analysts to put aside partial and imprecise, subjective opinions and views, focusing on what has objectively happened, rather than on what is supposed to occur. The talk will start from an overview of the general Process-Mining techniques, and then will focus on how process mining can be used to concretely improve processes.


  • Marco Slot

    Marco Slot

    Lead Engineer in Microsoft Citus

    Marco Slot is a principal software engineer at Microsoft, where he leads the development of Citus -- a PostgreSQL extension for distributed databases - as part of Azure. Prior to that, he joined Citus in the early stages of the startup, in 2014. He has a PhD in distributed systems and self-drivign cars, and Master's and Bachelor's degrees in computer systems from VU in Amsterdam. He is a speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, and lots of meetups.

    Email: marco.slot@microsoft.com
    Web: https://www.citusdata.com/blog/authors/marco-slot/

    Lecture: Distributed PostgreSQL
    Slides: Part 1/1
    The PostgreSQL project started over 25 years ago, but in many ways it is the state-of-the-art in database management systems. PostgreSQL is widely deployed, forked into many different database projects, and its protocol is becoming the de facto standard for operational databases. The only downside of PostgreSQL is that it does not scale beyond a single server. Major database vendors are now pursuing different ways of scaling PostgreSQL using a variety of architectures. Amazon, Alibaba, Google, and Microsoft, as well as a variety of start-ups have each taken a fundamentally different approach to turning PostgreSQL into a distributed database, so far without a clear winner. I will first give an introduction to PostgreSQL to shine a light on why it is such a popular (distributed) data management tool, and then discuss the pros and cons of different distributed PostgreSQL architectures. We then dive into the different techniques that distributed database systems use and trade-offs they make to provide better scalability, availability, isolation, geo-distribution, recovery time, and other facets.


  • Patrick Marcel

    Patrick Marcel

    Université de Tours, France

    Patrick Marcel is an Associate Professor at the University of Tours, France. He earned his Ph.D. in Computer Science at INSA Lyon in 1998 and his French Habilitation à Diriger les Recherches at University of Tours in 2012. His current research focuses on database, OLAP and data warehousing, personalization, recommender systems, exploratory data analysis, and data narration. He authored numerous publications in international conferences and journals on these subjects, including Information Systems, Decision Support Systems, Data and Knowledge Engineering and Knowledge and Information Systems. He served as program committee member in top tier international conferences, including ER, VLDB, EDBT, and chaired the international Workshop on Data Warehousing and OLAP (DOLAP) in 2017 and 2021. He served as guest editor for international journals, including Information Systems and the International Journal of Data Warehousing and Mining. He is a member of the regular editorial board of the international journal Data and Knowledge Engineering.

    Email: patrick.marcel@univ-tours.fr
    Web: http://www.info.univ-tours.fr/~marcel/

  • Verónika Peralta

    Verónika Peralta

    Université de Tours, France

    Verónika Peralta is an Associate Professor at the University of Tours (France) where she is head of the Computer Science department. She received her Ph.D. in 2006 from the University of Versailles (France) and the University of the Republic (Uruguay). Her current research interests include data and information quality, exploratory data analysis, business intelligence and data narration. She has published numerous papers in international refereed journals and conferences on these fields and served as program committee member and guest editor in many international conferences and journals. She has extended experience in teaching information systems, databases, data warehousing and data quality, and has large professional experience as a data warehouse developer and consultant.

    Email: veronika.peralta@univ-tours.fr
    Web: http://www.info.univ-tours.fr/~vperalta/

    Lecture: Data Exploration from Insights to Storytelling
    Slides: Part 1/2 Part 2/2
    This presentation reviews the state of the art of supporting Data Exploration, the notoriously tedious task of interactively analyzing datasets to gain insights. Starting from the SIGMOD 2015 survey by Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri, we shed a light on how the directions mentioned as perspectives in this 2015 survey were addressed by the research community. In particular, we point out key contributions in the domain of formalizing the problem, defining insight interestingness, and crafting data stories. We also identify topics that are still worth investigating, like a better inclusion of different human profiles in the loop.

News

1

The web site for the summer school was launched.

Main sponsors