|Application opens:||March 1, 2017|
|Application closes:||May 15, 2017|
|Notification of acceptance:||May 20, 2017|
|Deadline for payment of registration:||June 1, 2017|
|Arrival of participants:||July 2, 2017|
|Start of event:||July 3, 2017|
|End of event:||July 7, 2017|
Ziawasch Abedjan is an assistant professor and the head of the "Big Data Management" (BigDaMa) Group at the TU Berlin in Germany and a Principal Investigator in the Berlin Big Data Center. Prior to that, Ziawasch was a postdoctoral associate at MIT CSAIL where he worked on various data integation topics. He received his PhD from the Hasso Plattner Institute in Potsdam, Germany, where he worked on methods for mining Linked Open Data. His current research focuses on data integration and data profiling. He is the recipient of the 2014 CIKM Best Student Paper Award, the 2015 SIGMOD Best Demonstration Award, and the 2014 Best Dissertation Award from the University of Potsdam.
Lecture: Data Profiling and Data Analytics
One of the crucial requirements before consuming datasets for any application is to understand the dataset at hand and its metadata. The process of metadata discovery is known as data profiling. Profiling activities range from ad-hoc approaches, such as eye-balling random subsets of the data or formulating aggregation queries, to systematic inference of structural information and statistics of a dataset using dedicated profiling tools. In this course, we will discuss the importance of data profiling as part of any data-related use-case, and shed light on the area of data profiling by classifying data profiling tasks and reviewing the state-of-the-art data profiling systems and techniques. In particular, we discuss hard problems in data profiling, such as algorithms for dependency discovery and their application in new data discovery and data analytics system. We conclude with directions for future research in the area of data profiling.
Toon Calders graduated in 1999 from the University of Antwerp with a diploma in Mathematics. He received his PhD in Computer Science from the same university in May 2003, in the database research group ADReM, and continued working in the ADReM group as a postdoc until 2006. From 2006 till 2012 he was an assistant professor in the Information Systems group at the Eindhoven Technical University. In 2012 he joined the CoDE department at the ULB as a "Chargé de Cours" (associate professor) until he joined the University of Antwerp again in 2016 as a full professor. His main research interests include data mining and machine learning. Toon Calders published over 60 conference and journal papers in this research area and received several scientific awards for his works, including the recent "10 Year most influential paper" award for papers published in ECMLPKDD 2002. Toon Calders regularly serves in the program committees of important data mining conferences, including ACM SIGKDD, IEEE ICDM, ECMLPKDD, SIAM DM, was conference chair of the BNAIC 2009, EDM 2011, ECML/PKDD 2014, and Discovery Science 2016 conferences and is an editor for Springer Data Mining journal.
Lecture: Processing Data Streams
Sometimes data is generated unboundedly and at such a fast pace that it is no longer possible to store the complete data in a database. The development of techniques for handling and processing such streams of data is very challenging as the streaming context imposes severe constraints on the computation: We are often not able to store the whole data stream and making multiple passes over the data is no longer possible As the stream is never finished we need to be able to continuously provide, upon request, up-to-date answers to analysis queries Even problems that are highly trivial in an off-line context, such as: "How many different items are there in my database?" become very hard in a streaming context. Nevertheless, in the past decades several clever algorithms were developed to deal with streaming data. This talk covers several of these indispensable tools that should be present in every big data scientists' toolbox.
Johann Gamper is an associate professor at the Faculty of Computer Science of the Free University of Bozen-Bolzano, Italy. He received a MSc degree in Computer Science from the TU Vienna and a PhD degree in Computer Science from the RWTH Aachen. His research interests are in the area of data intensive systems with a focus on exact and approximate algorithmic solutions and database technologies for processing time-referenced data, including time series data. Johann Gamper is author of 80+ publications in top international database journals (TODS, VLDBJ, Information Systems, TCBB) and conference proceedings (SIGMOD, VLDB, ICDE, EDBT, KDD). He regularly has served the database community as reviewer for technical journals (VLDBJ, TKDE, TODS, Information Systems), PC member of important conferences (SIGMOD, VLDB, ICDE, EDBT, SSDBM, ADBIS, ACM SIGSPATIAL) and as conference organizer (Workshops chair ADBIS 2017, General Chair SSDBM 2018).
Temporal Data Management: An Overview
Despite the ubiquity of temporal data and intensive research activities in the eighties and nineties on processing such data, database systems for a long time have been designed for processing a static picture of the world represented by the current state. More recently, we can observe an increasing interest in processing historical or temporal data that capture the dynamics of the world. The SQL:2011 standard introduced some temporal features, and commercial DBMSs have started to offer temporal functionalities in a step-by-step manner, such as the representation of temporal intervals, temporal primary/foreign keys or the support for time travel queries that give the user access to states or snapshots in the past. A more recent study proposed a comprehensive solution for sequenced temporal queries with extended snapshot reducibility. New challenges for temporal data processing arise with the increasing amounts of time series data, which represent a particular kind of temporal data.
This tutorial will give an overview about research results and technologies for storing, managing and processing temporal data, with a focus on (relational) database systems. Different semantics, data models, index structures and algorithms that have been studied in research will be discussed. The survey will also contain a detailed analysis of temporal support in commercial and open source database management systems.
Evaggelia Pitoura is a Professor at the Computer Science and Engineering Department of the University of Ioannina, Greece where she also leads the Distributed Management of Data (DMOD) lab. She received a BSc degree from the University of Patras, Greece, and an MSc and PhD degree from Purdue University, USA. Her research interests are in the general area of data management with a recent focus on social networks and data exploration based on diversity, preferences and time. Her publications include more than 150 articles in international journal and conferences and a highly-cited book on mobile computing. She has served or serves on the editorial board of VLDBJ, TKDE and DAPD. She has served as a group leader, senior PC member and co-chair of many international conferences, including as PC chair of EDBT 2016 and PC co-chair of ICDE 2012. She is the recipient of three best paper awards (ICDE 1999, DBSocial 2013, PVLDB 2013), a Marie Curie Fellowship (2009) and two Recognition of Service Awards from ACM.
Lecture: Graph Queries and Analytics on Evolving Data Graphs
Graphs form a natural model for expressing relationships and interactions between entities. Most large graphs evolve over time. In this talk, we will explore ways of extracting information from the evolution of data graphs including: (1) navigational and graph pattern queries, (2) community finding, and (3) computation of centrality measures. We will look into modeling, storing, indexing and efficient processing of evolving graphs. Interesting applications from social and cooperation networks will also be presented.
Christoph Quix is a temporary professor for Data Science at RWTH Aachen University. He is also a senior researcher in the Life Science Informatics group at the Fraunhofer Institute for Applied Information Technology (FIT) in St. Augustin, Germany, where he leads the department for High Content Analysis. He completed his habilitation at RWTH Aachen University in early 2013, where he also received his Ph.D. degree in computer science. His research focuses on data integration, big data, management of heterogeneous data, metadata management, and semantic web technologies. He has about 100 publications in scientific journals and international conferences. He has been involved in several national and international research projects, which have been conducted in cooperation with research and industry partners. He has been visiting researcher at Microsoft Research. He has also been a freelancer at Thinking Networks AG, supporting the development of a web-based, multidimensional planning software.
Lecture: Data Quality for Big Data Applications
Data quality has been a topic in database research since the 1990s as the integration of heterogeneous data sources was addressed in data warehouse projects or in combining various web data sources. The integration of different sources and the usage of the data in for new business processes reveals data quality problems such as inconsistency, incompleteness, or incorrectness. These data quality problems are aggravated in big data applications as data sources are even more heterogeneous and data is used in more diverse applications with new requirements.
The lecture will give an overview of the data quality research and distinguish between reactive data cleaning approaches and proactive data quality management. The recent challenges for data quality in the area of big data applications will be discussed. We will also examine data quality models and recent data management architectures such as data lakes. As a practical part, tools and concrete techniques for data quality measurement and improvement will be presented.
Christian Thomsen is an associate professor at the Department of Computer Science, Aalborg University, Denmark. He received his MSc degree in Computer Science and Mathematics in 2004 and his PhD in Computer Science from Aalborg University in 2008. His research interests include big data, business intelligence, data warehousing, and extract-transform-load processes.
Lecture: Programmatic ETL
Extract-Transform-Load (ETL) processes are used for extracting data, transforming it and loading it into data warehouses (DWs). The dominating ETL tools use graphical user interfaces (GUIs) such that the developer “draws” the ETL flow by connecting steps/transformations with lines. This gives an easy overview, but can also be rather tedious and require much trivial work for simple things. We therefore challenge this approach and propose to do ETL programming by writing code. To make the programming easy, we present the Python-based framework pygrametl which offers commonly used functionality for ETL development. By using the framework, the developer can efficiently create effective ETL solutions from which the full power of programming can be exploited. In this lecture, we will present our work on pygrametl and pygrametl-inspired frameworks. Further, we will consider some of the lessons learned during the development of pygrametl as an open source framework.
Jordi Vitria joined the University of Barcelona (UB) in 2007 as Full Professor, where he teaches an introductory course on Algorithms and advanced courses on Data Science and Deep Learning. From April 2011 to January 2016 he served as Head of the Applied Mathematics and Analysis Department, UB. He is currently member of the new Mathematics & Computer Science Department at UB. Jordi's research, when personal computers had 128KB of memory, was originally oriented towards digital image analysis and how to extract quantitative information from them, but soon evolved towards computer vision problems. After a postdoctoral year at the University of California at Berkeley in 1993, Jordi focused on Bayesian methods for computer vision methods. Now, he is the head of a research group working in deep learning, computer vision and machine learning.
Lecture: Let's open the black box of deep learning!
Deep learning is one of the fastest growing areas of machine learning and a hot topic in both academia and industry. This lecture will try to figure out what are the real mechanisms that make this technique a breakthrough with respect to the past. To this end, we will review some of the most common architectures (CNN, LSTM, etc.) and their applications by following a hands-on approach. By the end of the lecture, attendants will be able to (i) describe how a neural network works and combine different types of layers and activation functions; (i) describe how these models can be applied in computer vision, text analytics, etc.; (iii) develop simple models in Tensorflow.