Important Dates:
Application opens*:	Dec 18, 2019
Application closes:	May 15, 2020
Deadline for registering*:	June 1, 2020
Arrival of participants:	July 5, 2020
Start of event:	July 6, 2020
End of event:	July 10, 2020

Tenth European Big Data Management & Analytics Summer School (eBISS 2020)

Invited Speakers & Tutors

Sonia Bergamaschi

University of Modena and Reggio Emilia, Italy

Sonia Bergamaschi is full Professor of "Big Data Management & Analysis” at the Engineering Department" Enzo Ferrari "in Modena and leads the database research group (www.dbgroup.unimore.it). Her research activity was developed mainly in the area of Knowledge representation, management and integration. She has published over 200 articles in international journals and conferences, promoted the start-up "DATARIVER – www.datariver.it", founded in 2009 with the aim of engineering and distributing the data integration system MOMIS. Since then she has been acting as scientific director of the DATARIVER company which employs 6 full-time staff, 2 consultants for activities related to the theme of E-Ealth, in addition to the five founding members. She has been appointed on 2018 ACM distinguish researcher.

Email: sonia.bergamaschi@unimore.it
Web: http://personale.unimore.it/rubrica/dettaglio/sonia

Giovanni Simonini

University of Modena and Reggio Emilia, Italy

Giovanni Simonini is an Assistant Professor (RTD b) at the University of Modena and Reggio Emilia. Before that, Giovanni was a postdoctoral associate at MIT CSAIL, working with Prof. Michael Stonebraker in the Database group. He received the PhD degree in Computer Science from the University of Modena in 2016 and his doctoral dissertation won the PhD Thesis Award from the IEEE Computer Society Italy Section. He also held visiting positions at the University of Michigan and Qatar Computing Research Institute. Giovanni's research interests include data integration and big data management.

Email: simonini@unimore.it
Web: http://giovannisimonini.com

Lecture: Entity Resolution in the Big Data context
In recent years, thanks to the growing awareness of the potential value of the data and to the constant decrease of the costs for storing it, companies and organizations started to gather huge amounts of data related to any aspect of their business, even when not knowing in advance what that data might be useful for. Furthermore, data comes in a variety of forms (e.g., relations of a database, json and csv files, snippets of text from the web, etc. ). One of the main problems with this data is heterogeneity: data scientists and practitioners deal with data integration on a daily basis to merge data sets and unleash the true value of the data. As a matter of fact, being able to identify different representations that pertain to the same real-world entity in different data sets is a crucial task (known as Entity Resolution or Duplicate Detection) for data integration, as well as for data science. Another crucial problem arises due to the emerging increasing size of data sources, i.e. Big Data, so that the Entity Resolution task has to take into account scalability and computational costs. In this lecture, we will present the basic algorithms and techniques of data integration, in the context of big data. In particular, we will provide state-of-the-art similarity join algorithms and Entity Resolution techniques for distributed and parallel systems, such as Apache Spark.

Angela Bonifati

Lyon 1 University, France

Angela Bonifati is a full professor and the head of the Database group at Lyon 1 University. She received a PhD from Politecnico di Milano in 2002 and was a postdoctoral researcher at Inria in Paris until 2003. Her current research interests are on the interplay of relational and graph-shaped data paradigms, particularly on query processing, data integration and curation, metadata management and learning for these data models. She is the Program Chair of EDBT 2020, the Demo Co-Chair of ICDE 2020 and the Sigmod 2019 and 2020 Workshops Co-chair. She was Vice Chair of ICDE 2018 for the information extraction; data cleaning, and curation Track and Vice Chair of ICDE 2011 for the semi-structured data Track. She is an Associate Editor for several journals, including the VLDB Journal, ACM Transactions on Database Systems (TODS) and Distributed and Parallel Databases.

Email: angela.bonifati@univ-lyon1.fr
Web: https://perso.liris.cnrs.fr/angela.bonifati/

Lecture: Query-driven Graph Analytics
Graphs are becoming pervasive in several unconventional applications where connectivity needs to be leveraged for querying and analytical purposes. Such applications areas include the Semantic Web, Social Networking, Fraud Detection, Recommendation Systems and Knowledge Bases. At the heart of these applications, there is the property graph data model as the common ground laying the foundations of modern graph database systems. In this talk, I will focus on the recent field of graph data management systems and elucidate the current state of the art on graph querying and analytics.

Pedro Delicado

Universitat Politècnica de Catalunya, Spain

Pedro Delicado is full professor of Statistics at the Universitat Politècnica de Catalunya Barcelona-TECH. His research activity has been mainly devoted to Functional Data Anlasys (focusing on dimensionality reduction and spatial dependence), but in recent years he is interested in exploring links between Statistics and Machine Learning, with particular interest in predictive models interpretability. He has supervised 5 PhD thesis. Moreover, he has been the principal researcher of 6 public funded research projects and 5 research projects with private companies. He also has held short visiting positions at the University of California at Davis and at the University of Toulouse.

Email: pedro.delicado@upc.es
Web: http://www-eio.upc.es/~delicado/

Lecture: Functional Data Analysis. An introduction with R
Functional data arise when one of the variables of interest in a data set can be seen naturally as a smooth curve or function. Functional Data Analysis (FDA) can then be thought of as the statistical analysis of samples of curves. In the last two decades, FDA techniques have evolved rapidly, which has allowed the FDA to reach a remarkable methodological maturity. Many standard statistical methods have been adapted to functional data: regression models (lm, glm, non-parametric regression, ...), multivariate analysis (PCA, MDS, Clustering, Depth measures, ...), time series, spatial statistics, among other. At the same time, its methods have been applied to quite broadly in medicine, science, business, engineering, demography and social sciences, etc. This course offers an introduction to FDA and presents some of the R libraries oriented to this type of data. The aim is that at the end of the course the students are able to identify situations in which they can treat their data as functional, to represent them computationally, to apply simple FDA techniques (descriptions, dimensionality reduction, regression) and to visualize the results.

Michela Milano

University of Bologna, Italy

Michela Milano is full professor on computer science at DISI – University of Bologna since April 2016. She is Deputy President of the European Association of Artificial Intelligence (EurAI) and past Executive Councilor of the Association for the Advancements of Artificial Intelligence (AAAI), past member of the Executive Committee of the Association for Constraint Programming and of the Italian Association of Artificial Intelligence. Her research activity concerns Artificial Intelligence with particular focus on decision support and optimization systems covering both theoretical and practical aspects in application fields as energy, mobility, computing, policy making and sustainability. In this field Michela Milano has achieved international visibility and has collaborations with many research groups and companies. She is Editor in Chief of the Constraints Journal, Area Editor of Constraint Programming Letters in the area of Search and past Area Editor of INFORMS Journal on Computing in the area Logic, Constraint and Optimization, member of the Editorial Board of ACM Computing Surveys for the area of Artificial Intelligence. She has edited two collections on hybrid optimization and she is author of more than 140 papers on peer reviewed international conferences and journals. On these topics Michela Milano has given many tutorials and keynote speech in in the major international conferences on Artificial Intelligence. She coordinated many European, Italian and regional projects and she is responsible of collaborations with industries. In 2016 she has been the recipient of the Google Faculty Research Award on the use of deep network in combinatorial optimization.

Email: michela.milano@unibo.it
Web: https://www.unibo.it/sitoweb/michela.milano/en

Michele Lombardi

University of Bologna, Italy

Michele Lombardi is a fixed-term Assistant Professor at the DISI department of the University of Bologna, working on Combinatoral Optimization and Decision Support Systems. In particular, his research activity is focused on hybrid optimization methods, based on heterogeneous techniques such as Constraint Programming, (Mixed) Integer Linear (and Non-Linear) Programming, and Machine Learning. His main application fields are Resource allocation and Scheduling problems, Cyclic Scheduling (e.g for control system design), and Scheduling problems in the presence of Uncertainty. More recently, he has started to work with prof. Michela Milano on a methodology to solve optimization problems over complex system by embedding Machine Learning models withing optimization models: they called it "Empirical Model Learning".

Email: michele.lombardi2@unibo.it
Web: http://ai.unibo.it/people/MicheleLombardi

Lecture: Empirical Model Learning: merging knowledge-based and data-driven decision models
Designing good models is one of the main challenges for obtaining realistic and useful decision support and optimization systems. Traditionally combinatorial models are crafted by interacting with domain experts with limited accuracy guarantees. Nowadays we have access to data sets of unprecedented scale and accuracy about the systems we are deciding on. In this talk we propose a methodology called Empirical Model Learning that uses machine learning to extract data-driven decision model components and integrates them into an expert-designed decision model. We outline the main domains where EML could be useful and we show how to ground Empirical Model Learning on a problem of thermal-aware workload allocation and scheduling on a multi-core platform. In addition, an hands on session will be presented to get insights about the practical use of EML.

Alkis Simitsis

Hewlett Packard Labs, USA

Alkis is a senior research scientist at Hewlett Packard Labs - Analytics Lab. He received a Diploma in Electrical and Computer Engineering and Ph.D. degree in Computer Science from the National Technical University of Athens (NTUA), Greece, in 2000 and 2004, respectively. Following that, he worked at IBM Almaden Research Center and he was a research visitor at Infolab of Stanford University.

Web: http://www.dblab.ntua.gr/~asimi/

Lecture: Big Data Infrastructure
Organizations worldwide invest heavily in technologies for generating information and insights from big data. Typically, they use the big data stack, which is composed of multiple distributed systems deployed across on-premises datacenters, private and public cloud deployments, and hybrid combinations of these. In this talk, we will describe modern techniques for handling big data, including processing and optimization of programs spanning multiple execution and storage platforms, workload management challenges in hybrid environments, and performance management requirements that arise in big data stacks.

Xiaofang Zhou

The University of Queensland, Australia

Professor Xiaofang Zhou is a Professor of Computer Science at The University of Queensland, leading the Data Science Research Group at UQ. His research focus is to find effective and efficient solutions for managing, integrating and analyzing very large amount of complex data for business, scientific and personal applications. He has been working in the area of spatial and multimedia databases, data quality, high performance database systems, data mining, streaming data analytics and recommendation systems. He is a Program Committee Chair for PVLDB 2020, SSTD 2017, CIKM 2016, ICDE 2013, and a General Chair of MDM 2018 and ACM Multimedia 2015. He has been an Associate Editor of The VLDB Journal, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Cloud Computing, World Wide Web Journal, Distributed and Parallel Databases, and IEEE Data Engineering Bulletin. He was the Chair of IEEE Technical Committee on Data Engineering (2015-2018). He is a Fellow of IEEE.

Email: bigk@itee.uq.edu.au
Web: http://staff.itee.uq.edu.au/zxf/

Lecture: Spatial Trajectory Analytics
Spatial trajectory analytics involves a wide range of research topics including data management, query processing, data mining and recommendation systems. It can find many applications in intelligent transport systems, social media analysis, location-based systems, urban planning and smart city. New opportunities arise with massive and rapidly increasing volumes of high-quality spatiotemporal data from many sources such as GPS devices, mobile phones and social network applications, together with more powerful computing platforms and machine learning algorithms. Managing large-scale trajectory data and making sense from it become critically important for many enterprises. In this talk we will give an overview of this research field and discuss new research problems and new approaches for trajectory computing research.