Important Dates:
Application opens:	March 1, 2016
Application closes:	May 15, 2016
Notification of acceptance:	May 20, 2016
Deadline for payment of registration:	June 1, 2016
Arrival of participants:	July 3, 2016
Start of event:	July 4, 2016
End of event:	July 8, 2016

Sixth European Business Intelligence & Big Data Summer School (eBISS 2016)

Invited Speakers & Tutors

Ladjel Bellatreche

ENSMA Poitiers, France

Ladjel Bellatreche is a Professor at National Engineering School for Mechanics and Aerotechnics (ENSMA), Poitiers, where he joined as a faculty member since Sept 2010. He leads the Data and Model Engineering Team of Laboratory of Computer Science and Automatic Control for Systems (LIAS) (LIAS). Prior to that, he spent eight years as Assistant and then Associate Professor at Poitiers University, France. He was a Visiting Professor of the Québec en Outaouais, Canada, a Visiting Researcher at Department of Computer Science, Purdue University, USA and Department of Computer Science of Hong Kong University of Science and Technology, China. He is also involved in Research Postgraduate Programmes in Computer Science of several Universities and Schools in Algeria (Sidi Bel Abbès, where here he got his Engineer degree from the Department of Computer Science in 1992, National High School for Computer Science, Boumerdes University, Oran University, Béjaia University, Saida University, Béchar University, etc.). Prof. Ladjel Bellatreche has been actively involved in the research community by serving as reviewer for technical journals (IEEE TKDE, DKE, Distributed and Parallel Database Journal, JoDS, etc.) and Editorial Board Member, International Journal of Reasoning-based Intelligent Systems, Inderscience, subject area editor of the Scalable Computing Journal, Springer and as an organizer/co-organizer of numerous international and National Conferences and Workshops (DAWAK, DASFAA, DOLAP, MEDI, WISE, EDA, JFO). Some recent conferences in which he is playing or has played major roles include DAWAK, DOLAP, MEDI, WISE Workshops. In addition, he served as a program committee member for over forty international conferences and Workshops. Ladjel Bellatreche actively contributes in promoting research in Africa and Asia, where he co-supervises several students and organizes conferences and workshops (ICT-EurAsia, MEDI, CIIA, etc.).

Email: bellatreche@ensma.fr
Web: http://www.lias-lab.fr/members/bellatreche

Lecture: Eco-design of Data Warehouses
Being in France, following the U.N. climate change conference that is being held in Paris from 30 November through 11 December, 2015 is a great opportunity and a motivation for French citizens, international visitors, companies, politicians, scientists, etc. to integrate the environmental dimension into their lives as well as into research and development. This can be done by either following well-established strategies or by proposing new initiatives. The efficiency of any initiative strongly depends on the efforts made to have a strategy based on guidelines. As researchers in the field of databases, one of the most active research communities, we are compelled to address energy issues. It should be noticed that DBMS are one of the main energy consumers, as the deluge of data has to be stored and efficiently managed. The database community did not stand idle since the past decade: to this aim, it is constantly proposing initiatives covering both software and hardware. Unfortunately, those initiatives are not supported by strategies that can be reused by companies or researchers that develop new types of DBMS; in particular, we are witnessing a boom in the number of new companies, both big and small, that are building their own DBMS on their favourite platforms. In this talk, we first propose to capitalize on the efforts put into building energy-aware query optimisers, which have the lion’s share of the overall energy consumption. Secondly, we undertake an in-depth analysis of different tasks that a query optimizer performs to evaluate a given query, and we identify their energy sensitivity. Thirdly, we implement our proposal on PostgreSQL. Finally, we present and discuss intensive experiments using mathematical cost models and an energy measurement tool using datasets of TPC-H benchmark to assess the effectiveness of our proposal.

Ismael Caballero

University of Castilla La Mancha, Spain

Ismael Caballero holds a PhD on Computer Science from the University of Castilla-La Mancha. He teaches Software Engineering and Software Design in the Information and Technology Systems Department in the Escuela Superior de Informática (ESI) in Ciudad Real, where he is Vicedean for Corporate Relations. His researching lines includes data quality management and data governance for Big Data. He has published works related to his investigations in various fora and journals. He serves the Spanish Association of Data and Information Quality (AECDI) as President. He is member of the AENOR CTN 116, and he was nominated as national expert by AENOR to participate in the ISO TC 184/SC4/WG23 for the development of the family of international standards ISO 8000, being the project leader of ISO 8000-62, and Project Editor of ISO 8000-3, ISO 8000-60, ISO 8000-63 and ISO 8000-64.

Email: Ismael.Caballero@uclm.es
Web:

Lecture: Data governance and data quality management for Big Data
The lecture will cover the following topics: 1) Introduction to Data Quality and Data Governance. 2) International standards addressing Data Quality and Data Governance. 3) Data as an asset for Big Data ecosystems. 4) Increasing the organizational value of data by means of Data Quality and Data Governance Processes based on International Standards. 5) A study case of how Data Quality and Data Governance have helped to increase the organizational value of data as an asset.

Pedro Furtado

University of Coimbra, Portugal

Pedro Furtado is Professor at University of Coimbra UC, Portugal, where he teaches courses in both Computer and Biomedical Engineering. Pedro has more than 25 years experience in both teaching, doing research and supervising industry projects. As part of his work, he has supervised more than 50 Software Engineering projects in different industries, with some emphasis on telecommunications and mobility-related projects. His main research interests are on performance and scalability qualities of systems, and also assistive technologies. Pedro applied these qualities in data warehousing, bigdata, analytics, data mining, cloud, IoT and realtime systems. Concerning assistive technologies, Pedro focuses his group activities on applying mobile and internet-of-things technologies to healthcare scenarios. Pedro has more than 150 papers published in international conferences and journals as well as some books, and several research collaborations with both industry and academia. In the last years, Pedro has spent a lot of time as visiting scholar in some of the most prestigious universities in the world, and collaborating with several non-profit institutions and companies as well. Besides a PhD in Computer Engineering from U. Coimbra (UC) (2000), Pedro Furtado holds an MBA from Universidade Catolica Portuguesa (UCP) (2004).

Email: pnf@dei.uc.pt
Web: https://eden.dei.uc.pt/~pnf/

Lecture: Big Data Warehouses and Analytics: About Scalability and Realtime
Scalability and Realtime in Data Warehouses and Analytics are important topics today, especially given all the hype about BigData. It is not just about throwing more machines and computing power to solve processing problems. We should learn from the past. We should understand the models and mechanisms used to store, process and analyze big amounts of data efficiently. We should think about whether relational or other organizations should be used, as well as the advantages and disadvantages of each. This talk is about the past, the present and the future of scalable and realtime data processing engines and analytics. The talk will include: Introduction and Concepts; Data Warehouse Scalability:Mechanisms and Limitations; Realtime Data Warehousing; Big Data Warehousing NOW; Benchmarking Scalability; Scalability Models and Total Scalability; Scalable Data Analysis.

Arnaud Soulet

University of Tours, France

Arnaud Soulet teaches data mining for IT4BI. He received his PhD in 2006 from the University of Caen. He is currently associate professor in computer science since 2007 at Blois University Institutes of Technology attached to the University François Rabelais Tours. He mainly teaches at undergraduate, especially databases. His research interests include OLAP, data mining and machine learning.

Email: arnaud.soulet@univ-tours.fr
Web: http://www.info.univ-tours.fr/~soulet/

Lecture: Two decades of Pattern Mining
In 1993, Rakesh Agrawal, Tomasz Imielinski and Arun N. Swami published one of the founding papers of Pattern Mining: ``Mining Association Rules Between Sets of Items in Large Databases''. It aimed at enumerating the complete collection of regularities observed in a given dataset like sets of products purchased together in a supermarket. Beyond the introduction of a new problem, it introduced a new methodology in terms of resolution and evaluation. For two decades, Pattern Mining has been one of the most active fields in Knowledge Discovery in Databases. This talk presents an overview of Pattern Mining based on a bibliometric survey of the literature relying on publications from five major international conferences. It is clear that frequent patterns (frequent itemsets in particular) have focused most of this work. More generally, most algorithms achieve a complete enumeration that tends to overwhelm the analyst with too many discovered patterns. To alleviate this problem, recent approaches involve the end-user within the mining process even if it the completeness is lost.

Hannes Voigt

Dresden Database Systems Group, TU Dresden, Germany

Hannes Voigt is a post-doctoral researcher in the Dresden Database Systems Group. During affiliation with the group he has worked on various database topics, ranging from database evolution and versioning, data modelling, management of schema-flexible data, self-adapting indexes, index selection, model-driven databases, and efficient mass data transfer on web services. His Ph.D., finished in 2014, focused on the impact of schema-flexible data on database system architecture. From 2010 to 2011, Hannes worked at SAP Labs, Palo Alto on a predecessor project of the SAP HANA graph management functionality. In the project, he developed a query and analytics language for set-oriented processing of graph nodes including traversals. Since returning to Dresden, he has kept involved in the subsequently started HANA Graph Project at SAP, Walldorf. After finishing the Ph.D., his research interest shifted complete to graph data management. His current research focuses on design declarative graph query and analytics languages based on pattern matching as well as traversal and their efficient processing on NUMA in-memory storage systems.

Email: hannes.voigt@tu-dresden.de
Web: https://wwwdb.inf.tu-dresden.de/team/staff/hannes-voigt/

Lecture: Graph Analytics
While the main reason for setting up a database was book keeping of individual items in the 1980s and reporting on them in 1990s, the most progressive driver now is the increasing interest on how things are related and connected. Here, the graph data model is the most natural choice, which makes graph data management very relevant now and in the future. Although graph data management is not a particularly new research field, its newly gained momentum and its lack of standardized solutions makes it still an open field. Analytics is one particular driver behind the newly gained momentum. With the particular nature of analytical questions and the skewed and recursive nature of graph data, graph analytics bears its very own challenges. At the user end, there is no commonly accepted abstraction yet, which is powerful enough for most graph analytical problems as well as declarative enough for query optimization. At the processing end, there is no established solution how to effectively leverage the abundant parallelism of modern hardware in analytical graph processing. In recent years, a multitude of solutions have been proposed. However, no master solution has emerged yet. This lecture gives an introduction into graph analytics as a field and provides an overview of the principle technological concepts proposed so far for expressing graph analytical queries and processing them on modern hardware infrastructures.

Shuly Wintner

University of Haifa, Israel

Shuly Wintner is a professor of computer science at the University of Haifa, Israel. His research spans various areas of computational linguistics and natural language processing, including formal grammars, morphology, syntax, language resources, and translation. He served as the editor-in-chief of Springer's Research on Language and Computation, a program co-chair of EACL-2006, and the general chair of EACL-2014. He was among the founders, and twice (6 years) the chair, of ACL SIG Semitic. Currently, he serves as the Head of the Department of Computer Science in Haifa. Professor Wintner has a doctoral degree from the Technion, Israel Institute of Technology. He did his post-doctoral research at the University of Tuebingen, Germany and the University of Pennsylvania in Philadelphia. He also spent one year as a visiting professor at the Language Technologies Institute of Carnegie Mellon University in Pittsburg, PA.

Email: shuly@cs.haifa.ac.il
Web: http://cs.haifa.ac.il/~shuly/Shuly_Wintner/Home.html

Lecture: Computational Approaches to Translation Studies
Translated texts, in any language, have unique characteristics that set them apart from texts originally written in the same language. Translation Studies is a research field that focuses on investigating these characteristics. Until recently, research Natural Language Processing (NLP), and in particular in machine translation (MT), has been entirely divorced from translation studies. The main goal of this tutorial is to demonstrate that two areas can benefit each other. First, we will survey some theoretical hypotheses of translation studies. Focusing on the unique properties of translationese (the sub-language of translated texts), we will distinguish between properties resulting from interference from the source language (the so-called ``fingerprints'' of the source language on the translation product) and properties that are source-language-independent, and that are presumably universal. The latter include phenomena resulting from three main processes: simplification, standardization and explicitation. All these phenomena will be defined, explained and exemplified. Then, we will describe several works that use standard (supervised and unsupervised) text classification techniques to distinguish between translations and originals, in several languages. We will focus on the features that best separate between the two classes, and how these features corroborate some (but not all) of the hypotheses set forth by translation studies scholars. Next, we will discuss several computational works that show that awareness to translationese can improve machine translation. Finally, we will touch upon some related issues and current research directions. For example, we will discuss recent work that addresses the identification of the source language from which target language texts were translated. We will show that native language identification (in particular, of language learners) is a closely related task to the identification of translationese. Time permitting, we will also discuss work aimed at distinguishing between native and (advanced, fluent) non-native speakers.

Emmanuel Zarpas

SAP, France

Emmanuel Zarpas is head Analytics Applied Research Product Team in SAP Paris. He holds a PhD in Computer Science from University Pierre-et-Marie-Curie of Paris. He previously worked as a Research Staff Member at IBM Haifa Research Laboratory from 2001 to 2008, where he was involved with formal verification and analytics. From 1997 until 2001 he worked as a Program Manager at Thales in Paris.

Email: emmanuel.zarpas@sap.com
Web:

Lecture: Business Intelligence at SAP Paris, overview and lessons from key projects
In this talk we will review the challenges and lessons learned from key business intelligence projects. We will review architecture and technology involved and narrow down key challenges from both a technical and functional and discuss some of the key innovation developed by Paris R&D teams.

Pierre Maussion and Marie Pérennès

Teradata, France

Pierre Maussion holds a Master in Information System and Networks from the University of Tours (Blois). He has worked in Business Intelligence projects with Teradata for more than twelve years in both Belgium and France for many different industries and mainly Telco, Bank, Retail and Media companies. His expertise is on Teradata Database from Data Warehouse and Data Mart modeling and implementation to optimization and for the last 4 years on Big Data solution architecture and advanced analytics projects.
Marie Pérennès is working as a Project Manager for Teradata Professional Services, after 6 years as a BI technical expert and Project manager in a Consulting company. She holds a SIAD Master degree from the University François Rabelais (Blois). Her expertise is on Datawarehouses and dataflows modeling, BI system design, which she experienced for 6 years as a design and technical expert, and for the last 3 years, as a project manager. After 6 years working for the Energy industry, her current project is building a major Teradata Datawarehouse for the French banking company “Crédit Agricole”, and she has the role of operational project manager of a team of 12 people.

Email: Pierre.Maussion@Teradata.com
Email: Marie.PERENNES@Teradata.com
Web: fr.teradata.com

Lecture: Discovery of the Teradata Database
Designed to deliver high performance, diverse queries, in-database analytics and sophisticated workload management, the Teradata Database outperforms all other vendor analytics solutions for very high volume Data Warehouses. In this talk, we will give you an overview of the main factors that make this system so particular : Architecture, Teradata Parallel processing, manageability, etc.