Eleventh European Big Data Management & Analytics Summer School (eBISS 2023)

Invited Speakers & Tutors


  • Pere-Pau Vàzquez

    Pere-Pau Vàzquez

    Universitat Politècnica de Catalunya, Spain

    Pere-Pau Vázquez is an associate professor at the Computer Science Department at the Universitat Politècnica de Catalunya in Barcelona. He is a member of the Visualization, Virtual Reality and Graphics Interaction Group (ViRVIG). He has been working in Computer Graphics and Visualization for the last 20+ years. His current interests are mostly related to visualization of large data sets, perception, and interaction in Virtual Reality environments.

    Email: pere.pau.vazquez(at)upc.edu
    Web: https://www.cs.upc.edu/~ppau

    Lecture: DataVis 10½: a practical brief intro to data visualization
    Slides: To be uploaded
    Visualization consists on creating visual representations of complex data to help users carry out tasks more effectively. To achieve such goals, many aspects have to be considered, which include the input data characterization, the perceptual bounds of humans, or the physical space in which visualizations have to be laid out, together with the tasks that need to be solved. It is thus, a huge field, that uses knowledge from different fields: cognitive psychology, computer graphics, and computer-human interaction. The purpose of this course is to give the attendees a glimpse on what visualization is, and provide some basic tools to produce interactive, nice-looking visualizations of data. To this end, the course will combine a set of dogmatic contents with some practical exercises to be developed using a Python library called Altair. The outline of the course will be: - Introduction to Visualization (concepts, goals, history…) - Dos and don’ts of data visualization (and why it matters) - Basic charts (purpose and implementation) - Compound and interactive charts (how to implement multiple views) - Further reading


  • Jordi Vitrià

    Jordi Vitrià

    University of Barcelona, Spain

    Jordi Vitrià is a Full Professor at the University of Barcelona (UB), which he joined in 2007, and where he teaches an introductory course on Data Science and advanced courses on Deep Learning and Ethical Data Science. He is now a member of the new Mathematics & Computer Science Department at UB. He is also the director of the Master in Foundations of Data Science and co-director of the Data Science and Machine Learning Postgraduate course at UB.His research, when personal computers had 128KB of memory, was originally oriented towards digital image analysis and how to extract quantitative information from them, but soon evolved towards computer vision problems. After a postdoctoral year at the University of California at Berkeley in 1993, he focused on Bayesian methods for computer vision methods. Now, he is leading a research group working in deep learning, machine learning and causal inference. He has authored more than 100 peer-reviewed papers and holds several international patents. He has directed 15 PhD theses in the area of machine learning and computer vision. He has been the leader of a large number of research projects at international and national level.

    Email: jordi.vitria(at)ub.edu
    Web: https://algorismes.github.io/

    Lecture: Causal Artificial Intelligence
    Slides: To be uploaded
    The scientific method aims at the discovery and modeling of causal relationships from data. It is not enough to know that smoking and cancer are correlated; the important thing is to know that if we start smoking or stop smoking, it will change our chance of getting cancer. Artificial Intelligence and Machine learning as it exists today does not take causation into account and instead make predictions based on statistical associations. This can give rise to problems when they are used in environments in which the associations used are not necessarily fulfilled or when such models are used for decision making. This picture has begun to change with recent advances in techniques for causal inference, which make it possible (under certain circumstances) to measure causal relationships from observational and experimental data and, in general, to make formal reasoning about cause and effect. As we will discuss in the talk, the convergence between machine learning and causal inference opens the door to answering questions relevant to many AI tasks.


  • Anna Queralt

    Anna Queralt

    Universitat Politècnica de Catalunya, Spain

    Anna Queralt is a Serra Húnter tenure-eligible lecturer at the Universitat Politècnica de Catalunya (UPC) since 2021, and holds a PhD in Computer Science from the UPC (2009). Since 2012 she has developed her research at the Barcelona Supercomputing Center (BSC), leading the Distributed Object Management research line within the Workflows and Distributed Computing group. This research has been materialized in an active object store, part of BSC's open-source software portfolio. Anna has published more than 30 peer-reviewed papers in international journals and conferences, as well as 3 book chapters in the topics of her research. She has received 2 Best Student Paper Awards, one as student and one as supervisor. She has participated in more than 20 European, national, and industrial research projects, and she is member of two working groups for the Strategic Research Agenda of the European Technology Platform for HPC. Her current research interests are distributed data management, data-intensive applications, edge-to-cloud continuum, and data integration.

    Email: anna.queralt(at)upc.edu
    Web: https://www.bsc.es/queralt-anna

  • Francesc Lordan

    Francesc Lordan

    Barcelona Supercomputing Center, Spain

    Francesc Lordan, Barcelona Supercomputing Center, Spain. Francesc obtained his PhD in Computer Architecture from the Universitat Politècnica de Catalunya in 2018 winning the Special Doctorate Award. Since 2010, Dr. Lordan has been part of the Workflows and Distributed Computing group of the Barcelona Supercomputing Center focusing his efforts on the development of the COMPSs programming model: a task-based model for parallel applications running on heterogeneous distributed infrastructures across the whole Edge-Cloud continuum involving edge and fog devices, clusters, supercomputers, grids, and clouds. During these years, Francesc has contributed to more than 20 R&D projects with competitive funding and has published more than 30 peer-reviewed papers.

    Email: francesc.lordan(at)bsc.es
    Web: https://www.bsc.es/lordan-gomis-francesc

  • Alex Barcelo

    Alex Barcelo

    Universitat Politècnica de Catalunya, Spain

    Alex Barcelo is a PhD student in Computer Architecture at Universitat Politècnica de Catalunya. He has studied Mathematics and Telecommunication Engineering and Electronics Engineering in the same university. He finished those degrees in 2015 and is currently working in the Distributed Object Management research line within the Workflows and Distributed Computing group at Barcelona Supercomputing Center. His research is focused on HPC, parallel programming, and integration of new storage technologies into storage systems.

    Email: alex.barcelo(at)bsc.es
    Web: https://www.bsc.es/barcelo-alex

    Lecture: Leveraging HPC techniques for data analytics
    Slides: To be uploaded
    The integration of high-performance computing (HPC) solutions with recent data analytics frameworks is a current research trend. Most of the work in this field has focused on improving data analytics frameworks by implementing their engines on top of HPC technologies such as Message Passing Interface. However, there is a lack of integration from the point of view of application development. HPC workflows have their own parallel programming models, while data analytics algorithms are mainly implemented using data transformations and executed with frameworks like Spark. Data analytics transformations can also be seen as a set of tasks and, thus, be implemented with task-based programming models. Task-based programming models are a highly flexible and very efficient approach for implementing HPC workflows. They are able to exploit the inherent parallelism of applications transparently, resulting in high programmability and ease of code migration. In addition, when combined with HPC data management solutions such as active object stores, the performance gains are further increased. In this lecture we introduce the notions of task-based programming models and active object stores, and how they can be leveraged in data analytics applications. We will also show how data analytics applications can be seamlessly developed with this approach and achieve better performance than Spark.


  • Matteo Lissandrini

    Matteo Lissandrini

    Aalborg University, Denmark

    Matteo Lissandrini is an Assistant Professor in the Department of Computer Science at Aalborg University working on Data Exploration, Graph Exploration, and Graph Management systems. Matteo has been a Marie Skłodowska Curie IF fellow. He received his PhD from the University of Trento (Italy) with a thesis on exploratory search for information graphs. He was also a member of the DbTrento research group. He has been visiting researcher at the HP Labs in Palo Alto, California in 2013, and at the Cheriton School of Computer Science at the University of Waterloo, Canada, in 2014.

    Email: matteo(at)cs.aau.dk
    Web: https://people.cs.aau.dk/~matteo

    Lecture: Graph Data Analysis & Exploration
    Slides: To be uploaded
    Complex data can be represented by the structure of the relationship between objects within a graph. Graphs are a fundamental tool for modelling social, technological, and biological systems, either as networks (e.g., social networks) or as more expressive semantic graphs (also called Knowledge Graphs). However, the widespread adoption of graph data, especially when to model and integrate different data sources, along with their complex generation processes, have made graphs very complex objects, requiring dedicated methods for their understanding. This course will present fundamental concepts and methods adopted for the analysis and exploration of graph data across different applications and use cases. It will provide different formalisms to model graph data in different domains (networks, KGs, and Property graphs) along with concepts adopted to describe fundamental structural characteristics, in particular approaches used for graph understanding at the node level, link level, and at the graph level, e.g., centrality measures, density, and modularity. It will further cover important analytical tasks, such as subgraph search, frequent graph mining, graph summarization, schema extraction, and example-based search and exploration techniques. The course will be accompanied by hands-on sessions where some of the approaches will be adopted to analyze real-world graph data in a typical data-science environment by adopting python, jupyter notebooks, and interacting with a graph DBMS.


  • Albin Ahmeti

    Albin Ahmeti

    Semantic Web Company, Austria
    TU Wien, Austria

    Albin Ahmeti is a Researcher at Semantic Web Company with focus on Semantic AI. Prior to the current position, he has worked at Semantic Web Company as a Data & Knowledge Engineer for more than 5 years developing Knowledge Graphs for various industries. He holds a PhD in Computer Science with focus on Semantic Web, and a MSc in Computer Engineering with focus on Data Integration. He has more than 12 years of theoretical and practical experience in the areas of Semantic Web and Data Integration, respectively in different institutes in academia and practical experience in industry.

    Email: albin.ahmeti(at)semantic-web.com
    Web: https://www.linkedin.com/in/albin-ahmeti-9683502b/

    Lecture: Recommender Systems based on Knowledge Graphs
    Slides: To be uploaded
    Knowledge graphs (KGs) have become widespread, enhancing search engines and, recently the associated tools and techniques to develop KGs have become even more mature. KGs typically consist of building blocks, such as controlled vocabularies (aka taxonomies), ontologies and instance data. In this tutorial, we are going to present the steps to create a knowledge graph---comprised of unstructured and structured data---that is used to power recommender systems. We start with different approaches of ingesting structured data into the KGs, and discussing the associated challenges such as schema mapping, data quality, wrangling and alignment. After that, we discuss NLP techniques that are used to extract knowledge from unstructured data and link to KGs. In the end, we present a demo application that leverages KGs to power food recommendations. Herein, we discuss how one can exploit rules that can be adjusted by humans (i.e., knowledge engineers) in order to roduce the desired recommendation outcome.


  • Lluís Belanche

    Lluís Belanche

    Universitat Politècnica de Catalunya, Spain

    Lluís Belanche has a degree in Computer Science and PhD in Artificial Intelligence from the Universitat Politècnica de Catalunya (UPC). He is a professor in the Computer Science Department of the UPC with more than thirty years of teaching experience. He has been a supervisor or tutor of more than a hundred student BSc, MSc or PhD theses. He currently teaches the degree in Data Science and Engineering and the master's in Innovation and Research in Informatics (MIRI), the master's in Advanced Mathematics and Mathematical Engineering (MAMME), the master's in Artificial Intelligence (AI) and the master in Data Science from the Barcelona School of Informatics (FIB). He has co-authored more than one hundred and thirty publications in journals and international conferences, and has participated in fifteen research projects. He has recently been Head of studies at the FIB. His current research interests are kernel methods for deep learning and axiomatics of similarity measures.

    Email: belanche(at)cs.upc.edu
    Web: https://www.cs.upc.edu/~belanche/

    Lecture: Dealing with missing values in modern data science: the good, the bad, and the ugly.
    Slides: To be uploaded
    Missing information arises everywhere where data is collected, in industrial processes, in scientific studies, business applications, etc, even in computer-generated data. Missing values are notoriously difficult to handle, specially when the lost parts are of significant size, or the causes and pattern of spread are unknown. Deleting observations and/or variables containing missing values results in loss of relevant data and is also frustrating because of the effort in collecting the sacrificed information. Imputation methods –which entail inferring values for the missing entries-- are a common workaround, although their impact on the modelling process is uncertain. In this short course I will discuss several approaches to deal with missing information, some classical, some sophisticated, some worthless. Ways to evaluate the quality of the different methods will also be worked out extensively, as well as how to deal with missingness in new data (e.g., test data in machine learning). Other topics discussed will touch interpretability and computational effort, trying to convey a picture as complete as possible of the issue.


  • Christine Doig

    Christine Doig

    Google, USA

    Christine Doig-Cardet was Director of Product Innovation at Netflix. She has spent her career at the intersection of data science and product in a variety of industries including energy, manufacturing, banking and entertainment. At Netflix, Christine lead the company’s efforts to improve the company’s discovery and personalization. She loves bringing together design and data to build impactful user experiences powered by the latest machine learning and artificial intelligence capabilities. She was previously at Anaconda, where she grew from a consulting data scientist to leading their enterprise product organization. Christine spends her time between Austin, Texas, and Barcelona, where she grew up and earned her M.S. in Industrial Engineering from Universitat Politècnica de Catalunya. She also holds a postgraduate certificate in Quantitative Methods for Financial Markets and Data Science from the same university.

    Email: christine.doig.cardet(at)gmail.com
    Web: https://www.christinedoig.com

    Lecture: Data Science in the Industry - From Research to Production
    Slides: To be uploaded
    Today, data drives every aspect in modern businesses: from managing strategic decisions and defining key business results to enabling personalized customer facing experiences. In this session, we’ll go on a journey to understand how data science is used in a data-driven organization. We’ll discuss how to define business objectives and product metrics, determine our data needs, build a machine learning application, and evaluate its impact through online experimentation. The goal of this lecture is to give attendees a small taste of what it's like to work at a tech company and show the impact that data has in shaping the digital products that we use every day.


  • Simone Scardapane

    Simone Scardapane

    Sapienza University of Rome, Italy

    Simone Scardapane is a tenure-track assistant professor at Sapienza University of Rome. His research is focused on graph neural networks, explainability, continual learning and, more recently, modular and efficient deep networks. He has published more than 100 papers on these topics in top-tier journals and conferences. Currently, he is an associate editor for the IEEE Transactions on Neural Networks and Learning Systems (IEEE), Neural Networks (Elsevier), Industrial Artificial Intelligence (Springer), and Cognitive Computation (Springer). He is a member of multiple groups and societies, including the ELLIS society, the IEEE Task Force on Reservoir Computing, the “Machine learning in geodesy” joint study group of the International Association of Geodesy, and the Statistical Pattern Recognition Techniques TC of the International Association for Pattern Recognition.

    Email: simone.scardapane(at)uniroma1.it
    Web: https://www.sscardapane.it/

    Lecture: Designing and explaining graph neural networks
    Slides: To be uploaded
    Graph neural networks are fundamental to analyse data from a variety of fields, ranging from medicine to social networks. In this talk we will see a brief overview of the basic building blocks of GNNs, including the idea of message-passing, graph pooling, and the problem of over-smoothing. We will also showcase working examples in code to highlight how these networks are implemented in practice. We will conclude by introducing the problem of explaining the predictions of such models, along with ideas and methods based on the field of explainable AI and recent ideas on building self-interpretable networks.

News

1

The web site for the summer school was launched.

Institutional sponsors