GridKa School 2016 - Data Science on Modern Architectures

Name: GridKa School 2016 - Data Science on Modern Architectures
Start: 2016-08-29T08:00:00+02:00
End: 2016-09-02T12:00:00+02:00
Location: FTU

Aug 29, 2016, 8:00 AM → Sep 2, 2016, 12:00 PM Europe/Berlin

FTU

Description

The International GridKa School 2016 is one of the leading summer schools for advanced computing techniques in Europe. The school provides a forum for scientists and technology leaders, experts, and novices to facilitate knowledge sharing and information exchange. The target audience is different groups like graduate and PhD students, advanced users as well as IT administrators. GridKa School is hosted by Steinbuch Centre for Computing (SCC) of Karlsruhe Institute of Technology (KIT). It is organized by KIT and the HGF Alliance "Physics at the Terascale".

Workshops		Plenary talks		Social Events
Hands-on sessions and workshops give participants an excellent and unique chance to gain practical experience on cutting edge technologies and tools.		Plenary talks presented by experts cover theoretical aspects of school topics and focus on innovative features of data science on modern architectures.		Two social events are important parts of the school to give participants the opportunity to get in touch with interesting people in a relaxed atmosphere to improve their networking and to have fun.

Participants

Aleksander Paravac
Alexander Schug
Alexandr Mikula
Alfred Franz
Andreas Heiss
Andreas Herten
Andreas Petzold
Andrew Lahiff
Arif Bayirli
Asfaw Yohannes
Benedikt Hegner
Benedikt Riehm
Brendan Bouffler
Byungyun KONG
Christian Haack
Christian Petters
Christoph Heidecker
Christoph Renner
Christoph Weinsheimer
Christoph-Erdmann Pfeiler
Christopher Jung
Christopher Stihl
CHRYSOVALANTIS MANTAFOUNIS
Clemens Hentschke
Daniel Hofmann
Daniel Schmid
Daniela Schaefer
Denise Mueller
Diana Gudu
Dimitri Nilsen
Dirk Jacobsen
Donghee Kang
Doris Wochele
Eileen Kühn
Elnaz Azmi
Elvin Sindrilaru
Emanuel Pfeffer
Eugen Wintersberger
Fabio Baruffa
Felice Pantaleo
Frank Baetke
Frank Fischer
Frank Koester
Gabriel Zachmann
Giovanni Siragusa
Graeme Stewart
Gregor Köhler
Gregor Mittag
Günther Erli
Hannes Hartenstein
HEEJUNE HAN
Holger Kluck
Igor Katkov
Ingolf Wittmann
Ingrid Schäffner
Iurii Sorokin
John Wood
Jose Castro Leon
Jurry de la Mar
Jörg Meyer
Jürgen A. Krebs
Jürgen Feiler
Karim El Morabit
Kevin Floeh
Klaus Maier-Hein
Leif Rädel
Lena Wiese
Liesbeth Vanherpe
Lorenzo Moneta
Malika Touil
Manfred Größer
Manuel Giffels
Manuela Kuhn
Marco Berghoff
Marco Kleesiek
Marco Nolden
Marcus Hardt
Marcus Schmitt
Marcus Strobl
Marek Szuba
Mario Lassnig
Matthias Schnepf
Max Fischer
Mehmet Soysal
Melanie Ernst
Michael Bontenackels
Michael Hufschmidt
Michał Orzechowski
Mirko Kämpf
Moritz Gelb
Natascha Krammer
Nathalie Rauschmayr
Nico Böhr
Nikolaus Trost
Nils Braun
Oleg Dulov
Olga Kambeitz
Oliver Freyermuth
Paul Jäger
Pavel Weber
Peter Braesicke
Peter Krauß
Preslav Konstantinov
Ralf Floca
Raphael Friese
Rob Blake
Samuel Ambroj Pérez
Sebastien Binet
Sergio Tafula
Simon Kohl
Stanisław Jankowski
Stefan Geißelsöder
Sven Schoo
Sven Sternberger
Sébastien Gadrat
Sławomir Zdanowski
Tamás Szép
Thomas Latzko
Tobias Kurze
Tonn Rüter
Tunner Chimhondo
Ugur Cayoglu
Ulrich Hartenstein
Ulrike Schnoor
Valentin Kozlov
Vimal Chander M
Will Breaden Madden
William Ma
Xavier Valls
Yousri Mhedheb

Mon, August 29
- 12:00 PM → 2:00 PM
  
  Registration Foyer
  
  Foyer
  
  FTU
- 2:00 PM → 6:00 PM
  Plenary Talks Aula
  
  Aula
  
  FTU
  
  slides
  - 2:00 PM
    
    Welcome to KIT 30m Aula
    
    Aula
    
    FTU
    
    Speaker: Prof. Hannes Hartenstein (KIT)
    
    Slides
  - 2:30 PM
    
    Welcome to GridKa School 2016 20m Aula
    
    Aula
    
    FTU
    
    Speaker: Dr Manuel Giffels (KIT)
    
    Slides
  - 2:50 PM
    
    Data Science - Past, Present and Future 40m Aula
    
    Aula
    
    FTU
    
    This presentation will give an overview of the rapid development over the past 15 years of what is often called the “Fourth Paradigm” leading to Open Science and Open Innovation. A number of examples from research infrastructures will be used to illustrate the current situation and the role of global organisations like the Research Data Alliance have in bringing about the revolution in Data Science which is fundamental to this approach. As the internet of things develops some ideas will be shared about the possible future including the impact on the way society copes with the implications.
    
    Speaker: Prof. John Wood (RDA Council Co-Chair)
    
    Slides
  - 3:30 PM
    
    Coffee Break 30m Aula, FTU
    
    Aula, FTU
  - 4:00 PM
    
    ROOT as a Service - Jupyter Notebook 40m Aula
    
    Aula
    
    FTU
    
    The Jupyter Notebook is a narrative that combines data visualisation, text, code and multimedia material, all in the same web application. It can be used for instance for data analysis, simulations, statistical inference, machine learning or teaching. Moreover Notebooks encourage suitably documented scientific code, a desirable feature when reproducibility and preservation of analyses is sought after. This lecture introduces the basic concepts behind Jupyter Notebooks and reviews some of its possible usages in science: examples of sophisticated statistical data analysis, multivariate techniques and data visualisation will be discussed. Driven by a pragmatic attitude, the theoretical concepts are complemented by concrete applications, involving cutting edge production tools coming also from the Python analysis ecosystem and High Energy Physics community.
    
    Speaker: Dr Lorenzo Moneta (CERN)
    
    Slides
  - 4:40 PM
    
    Data Challenges in Photon Science 40m Aula
    
    Aula
    
    FTU
    
    This presentation will give an overview about the data life cycle in photon science and the resulting challenges on data handling. The discussed topics include data taking, analysis, storage and achival. The technical realization is illustrated on an example of currently developed infrastructures for synchrotrons and free electron lasers.
    
    Speaker: Manuela Kuhn (DESY)
    
    Slides
  - 5:20 PM
    
    Data Management in NoSQL Databases 40m Aula
    
    Aula
    
    FTU
    
    The relational data model - where data are stored in tables and hence structured according to some fixed set of attributes (that is, table columns) - has been a success for several decades. Furthermore, SQL is a standardized and widely used query and management language for relational databases. A transformation of commonly occurring data into the relational table format is however not always convenient. On the contrary, storing arbitrary documents, objects in programming language, XML data and the like in relational databases imposes a huge overhead. Moreover, relational databases are geared towards frequent queries on a stable set of data with infrequent updates. Novel requirements for database management systems lead to an emergence of several alternatives to relational systems, so that data can be stored in other structures with a flexible update and query behavior and distributed on multiple servers. Under the slogan NOSQL (in the sense of Not Only SQL) some systems have come up that concentrate on versatile use cases while diverging from the relational data model. This talk surveys some of these NOSQL technologies that are employed in Cloud Computing or in social networks and hence will become gradually more significant. The talk will cover graph databases, XML databases, key-value stores and column-family stores.
    
    Speaker: Dr Lena Wiese (Georg-August-Universität Göttingen)
    
    Slides
Tue, August 30
- 9:00 AM → 12:00 PM
  Plenary Talks
  
  slides
  - 9:00 AM
    
    Technologies to build hybrid clouds on public-private infrastructures 40m Aula
    
    Aula
    
    FTU
    
    The Helix Nebula initiative continues to expand with research organisations, providers and services. The hybrid cloud model deployed by Helix Nebula has grown to become a viable approach for provisioning ICT services for research communities from both public and commercial service providers (http://dx.doi.org/10.5281/zenodo.16001). The relevance of this approach for all those communities facing societal challenges in explained in a recent EIROforum publication (http://dx.doi.org/10.5281/zenodo.34264). The presentation will outline how a common platform for data intensive services that builds upon existing public funded e-infrastructures and commercial cloud services is achieved to promote open science. The high-level architecture, key technologies as well as the role of standards is described.
    
    Speaker: Dr Jurry De la Mar (T-Systems)
    
    Slides
  - 9:40 AM
    
    Docker and Linux Containers 40m Aula
    
    Aula
    
    FTU
    
    Linux containers (LXC) is a technology that provides operating system-level virtualisation not via a virtual machines but rather by using a single kernel to run multiple instances on the same OS. Linux namespaces and control groups (cgroups) represent the foundation on which LXC are built. Containers are fast to deploy, they introduce no overhead or indirection as in the case of traditional virtual machines and also have the added design benefit of ensuring complete isolation between processes. Containers are great for running multiple instances of the same service in parallel either as part of a scaling out strategy or just for testing purposes. Docker is built around Linux containers and offers an intuitive way of managing them by abstracting and automating some of the configuration details. Besides being an open-source project, Docker has enabled the development of an entire "ecosystem" of tools and products targeting container technology.
    
    Speaker: Elvin Sindrilaru (CERN)
    
    Slides
  - 10:20 AM
    
    Coffee Break 20m Aula, FTU
    
    Aula, FTU
  - 10:40 AM
    
    Hitachi Data Systems - Digital Transformation within existing Big Data Environments 40m Aula
    
    Aula
    
    FTU
    
    Hardware and beyond In a world where new hypes and trends flooding the CIO's on a daily base it's hard to keep up with innovations and staying on a tight budget. Extract / Transfer and Load has been seen as the solution for years. We show a way starting different and solving existing issues without any vendor lock in.
    
    Speaker: Krebs Jürgen (Hitachi)
    
    Slides
  - 11:20 AM
    
    Openstack @ CERN 40m Aula
    
    Aula
    
    FTU
    
    OpenStack is an open-source cloud computing platform for public and private clouds. OpenStack software controls large pools of compute, storage, and networking resources throughout a datacentre, managed through a dashboard or via the API. OpenStack works with popular enterprise and open source technologies making it ideal for heterogeneous infrastructure. The presentation will outline our journey to build an OpenStack based private cloud. This talk will cover our experiences in the deployment, management and integration of a large scale cloud at CERN.
    
    Speaker: Jose Castro Leon (CERN)
    
    Slides
- 12:00 PM → 1:00 PM
  
  Lunch 1h Canteen
  
  Canteen
- 1:00 PM → 6:00 PM
  
  Docker Tutorial 5h Room 157
  
  Room 157
  
  FTU
  
  The Docker tutorial will walk you through the basic steps of setting up a Docker environment on your machine. There will be a series of exercises that will detail the various concepts presented during the plenary talk which are critical that you understand for the later part of the tutorial. The final goal of the tutorial is to build and deploy a couple of containers that replicate the usual analysis workflow in High Enery Physics: you will have a container running a XRootD server providing the storage for the data and a different container that runs the ROOT framework where you will do your analysis. The tutorial will discuss into depth the concepts of port forwarding, volumes and resource management in the context of containers with a focus on understanding the advantages of containers over traditional virtual machines. In order to benefit to the maximum of the Docker tutorial part, there are some pre-requisites one needs to take into consideration. First of all, you should be comfortable working with the Linux terminal, installing packages over the command line, using the ssh client to connect to a remote machine and last but not least, editing files using one of the common editors in Linux: emacs, vi, nano etc.
  
  Speaker: Elvin Sindrilaru (CERN)
- 1:00 PM → 6:00 PM
  
  Introduction to OpenStack 5h Room 156
  
  Room 156
  
  FTU
  
  Speakers: Peter Krauss (KIT), Tobias Kurze (KIT)
- 1:00 PM → 6:00 PM
  
  Jupyter Notebooks for Science (ROOT as a Service) 5h Room 163
  
  Room 163
  
  FTU
  
  The objective of the exercises is to give to show the participants how to be even more productive in they everyday tasks thanks to Jupyter notebooks. During this hands on session the concepts treated during the front lesson are seen in action. At the beginning the opportunities given by the notebooks and the Jupyter web interface are reviewed. Elements of the several keyword shortcuts and web user interface management are tried out. The usage of markdown cells, code cells, the way to manage the output and notebooks' conversion to various formats such as HTML or pdf is illustrated. Once the basics are addressed, we focus on Python and C++ notebooks, starting from selected examples coming from High Energy Physics such as CMS Opendata or Large Hadron Collider machine studies. Libraries like ROOT, Matplotlib and Pandas are used. Particular attention is dedicated to machine learning activities. Finally the interplay of Python and C++ made possible by the ROOT Python bindings, PyROOT, is illustrated by examples.
  
  Speaker: Dr Lorenzo Moneta (CERN)
- 1:00 PM → 6:00 PM
  
  Puppet I 5h Room 155
  
  Room 155
  
  FTU
  
  Puppet is today the leading configuration management system. It allows to represent and to control the complete infrastructure of complex IT environments. In this course we will learn what Puppet is and how to use it. We will start to write simple examples and understand how the Puppet workflow could look like. We will also have a look in the surrounding ecosystem git, foreman and hiera. And will show how continous integration could help us to develop our sites.
  
  Speakers: Dimitri Nilsen (KIT), Dr Pavel Weber (KIT), Sven Sternberger (DESY)
- 1:00 PM → 6:00 PM
  
  Using R to improve data analyses Python workflows 5h Aula
  
  Aula
  
  FTU
  
  The choice between R and Python is often considered when selecting the best tool to use for data science workflows. Many people often prefer Python over R, since it offers a lot of flexibility and is commonly used in every-day programming. In contrast, R has a steep learning curve. However it offers a huge amount of specialised statistics libraries that are not available in Python so far. Furthermore it offers leading-edge features for visualisation. Combining R and Python in your workflows allows you to benefit from both. Throughout the tutorial, you will learn how to benefit from both tools. To get you running, we introduce several possibilities how to utilise R from Python. The tutorial itself is hands-on, applying concepts from every-day workflows. The tutorial targets participants who are familiar with using Python for their workflows. Experience with R is not required, though helpful.
  
  Speaker: Eileen Kühn (KIT)
- 6:30 PM → 10:00 PM
  
  Tarte Flambee Evening SCC
  
  SCC
Wed, August 31
- 9:00 AM → 12:00 PM
  Plenary Talks Aula
  
  Aula
  
  FTU
  
  slides
  - 9:00 AM
    
    Building HPC Clusters as Code in the [Almost] Infinite Cloud 40m Aula
    
    Aula
    
    FTU
    
    Every day, HPC clusters help scientists make breakthroughs, such as proving the existence of gravitational waves, screening new compounds for new drugs and designing better headlights for cars. No industry is untouched by HPC, yet owning HPC clusters is out of reach for most organizations due to the upfront hardware and ongoing operational costs. Now with the cloud, not owning an HPC cluster can be one of the most productive ways to compute everything from the fluid dynamics of a milk bottle to the evolution of the universe. And with the availability of free Public Data sets in Amazon S3, even earth observation data and cancer genome data are easily accessible for use in the cloud. Run 1 cluster for 10,000 hours or 10,000 clusters for 1 hour anytime, from anywhere---in the cloud. The speed of innovation is only bound by your imagination, not your budget. This talk will present why people are using Amazon for HPC and Scientific applications, how to build clusters on the fly, Amazon's on-demand SPOT market pricing. Lastly, it will provide real world examples from customers just like you, who’ve broken new ground because of the agility the cloud offers.
    
    Speaker: Brendan Bouffler (Amazon.com, Inc.)
    
    Slides
  - 9:40 AM
    
    Introduction to HTCondor 40m Aula
    
    Aula
    
    FTU
    
    Originally developed over 20 years ago as a means of making use of unused computing resources on desktops, today HTCondor plays an important role in providing high throughput computing for CERN’s Large Hadron Collider. For example, it is used as a batch system at an increasing number of sites, used as a grid computing element, used to provision both grid and cloud resources, and used as an overlay batch system running on top of multiple grid sites around the world at very large scales. This talk will give an introduction to HTCondor, its architecture and illustrate its flexibility by describing a number of varied use cases.
    
    Speaker: Dr Andrew Lahiff (Rutherford Appleton Laboratory)
    
    Slides
  - 10:20 AM
    
    Coffee Break 20m Aula (FTU)
    
    Aula
    
    FTU
  - 10:40 AM
    
    HEP Software Foundation 40m Aula
    
    Aula
    
    FTU
    
    Speaker: Dr Benedikt Hegner (CERN)
    
    Slides
  - 11:20 AM
    
    Scientific Computing in Climate Science 40m Aula
    
    Aula
    
    FTU
    
    The presentation will describe the state-of-the-art in composition-climate modelling of the atmosphere. Which equations do we use and how are they discretised? Which phenomenons can we describe and how and what can we learn about the atmosphere by characterising a models sensitivities? Issues of scalability and data access will be discussed, because comprehensive numerical experiments are required to tackle questions that are important for society, in particular global change.
    
    Speaker: Prof. Peter Braesicke (KIT)
    
    Slides
- 12:00 PM → 1:00 PM
  
  Lunch Break 1h Canteen
  
  Canteen
- 1:00 PM → 6:00 PM
  
  CEPH Tutorial 5h Room 163
  
  Room 163
  
  FTU
  
  Ceph is an open-source, software-defined distributed storage system that strives to achieve scalability and reliability through an innovative decentralised design. Distributed file systems nowadays face multiple challenges: scaling to peta-byte capacity and providing high performance, while protecting against failures. Moreover, file systems should be able to adapt to dynamic distributed workloads to provide the best performance. Ceph tries to tackle these challenges with a completely decentralised architecture that has no single point of failure. Reliability is achieved through distributed data placement and replication. Ceph's dynamic metadata partitioning feature helps deal with dynamic workloads. This session is an introduction to Ceph and it will cover theoretical background on Ceph's architecture, as well as hands-on exercises, such as installation and configuration of a Ceph cluster, basic usage and monitoring, and a dive into Ceph's secret to extreme scalability: the CRUSH algorithm. After completing this session, you should be able to understand and discuss Ceph concepts, and deploy and manage a Ceph Storage Cluster. Basic knowledge of Linux and storage concepts is required.
  
  Speaker: Diana Gudu (KIT)
- 1:00 PM → 6:00 PM
  
  Concurrent Programming in C++ 5h Room 157
  
  Room 157
  
  FTU
  
  In this course we will introduce how to program for concurrency in C++, taking advantage of modern CPUs ability to run multi-threaded programs on different CPU cores. Firstly, we will explore the new concurrency features of C++14 itself, which will also serve as a general introduction to multi-threaded programming. Students will learn the basics of asynchronous execution, thread spawning, management and synchronisation. Some elementary considerations about deadlocks and data races will be introduced, which will illustrate the common problems that can arise when programming with multiple threads. After this the Threaded Building Block template library will be introduced. We shall see how the features of this library allow programers to exploit multi-threading at a higher level, not needing to worry about so many of the details of thread management. Students should be familiar with C++ and the standard template library. Some familiarity with makefiles and/or CMake would be useful.
  
  Speaker: Dr Graeme Stewart (University of Glasgow)
- 1:00 PM → 6:00 PM
  
  Performance, profiling and debugging 5h Room 156
  
  Room 156
  
  FTU
  
  Nowadays, multi- and manycore CPUs provide many different performance dimensions. In order to profit from these, it needs to be understood how an application interacts with the hardware. This is done through profiling, which helps to determine resource requirements and to identify performance bottlenecks. The session will discuss the different perfomance dimensions of modern CPUs, how to perform benchmarking and high-/low-level application profiling in Linux. Commonly used profiling and debugging utilities will be presented. The session requires a basic understanding of Linux and CPU architecture.
  
  Speaker: Dr Nathalie Rauschmayr (CERN)
- 1:00 PM → 6:00 PM
  
  Puppet II 5h Room 155
  
  Room 155
  
  FTU
  
  Puppet is today the leading configuration management system. It allows to represent and to control the complete infrastructure of complex IT environments. In this course we will learn what Puppet is and how to use it. We will start to write simple examples and understand how the Puppet workflow could look like. We will also have a look in the surrounding ecosystem git, foreman and hiera. And will show how continous integration could help us to develop our sites.
  
  Speakers: Dimitri Nilsen (KIT), Dr Pavel Weber (KIT), Sven Sternberger (DESY)
- 1:00 PM → 6:00 PM
  Scientific Computing on Amazon Web Services 5h Aula
  
  Aula
  
  FTU
  A survey will be send around to all participants shortly before the GridKa School to asked for a specific sub-topic to cover. The following topics are available:
  1. Introduction to AWS (Basics for all services)
  2. Compute:
  3. Storage
  4. Database
  5. Management Tools
  6. Security & Identity
  7. Analytics
  8. Internet of Things - Connect Devices to the cloud
  9. Mobile Services
  10. Application services
  Speakers: Brendan Bouffler (Amazon.com, Inc), Christian Petters (Amazon Web Services GmbH)
- 6:30 PM → 8:00 PM
  
  Evening Lecture: "Automated and Connected Driving from a User-Centric Viewpoint" 1h 30m Aula
  
  Aula
  
  FTU
  
  Automated and connected driving can be seen as a paradigm shift in the automotive domain which will change the today’s role of the human driver substantially. More and more parts of the driving task are to be taken over by technical components for different scenarios. The application areas reach from fully automated parking up to automated driving on a highway with different involvements of the driver in monitoring tasks (SAE level 2 to 5). Thus, it is essential to understand how this role change will influence the human driver and how we can support him by designing an easy to understand, safe and comfortable interaction with the automated vehicle. From Human Factors research we know that it is crucial to ensure that the driver has a correct mental model about the automation levels and its current status to avoid mode confusion and resulting errors in operation. In addition, transitions of control between the driver and the automated vehicle need a careful design that is adjusted to the current traffic situation and the driver status. For example, mobile devices that the driver uses during the automated driving can be integrated smoothly in the overall interaction design concept to support transitions. The talk will give an overview on selected research topics in the area of user-centric development of automated and connected vehicles at DLR. It will underline why it is important to put the human into focus when designing a new, exciting technology such as automated and connected vehicles.
  
  Speaker: Prof. Frank Köster (Deutsches Zentrum für Luft- und Raumfahrt)
Thu, September 1
- 9:00 AM → 12:00 PM
  Plenary Talks Aula
  
  Aula
  
  FTU
  
  slides
  - 9:00 AM
    
    Big Data Analytics in Biology: Biomolecular Structure Prediction and Beyond by Tracing Residue Co-Evolution 40m Aula
    
    Aula
    
    FTU
    
    One grand challenge of life sciences in the coming years is to fully leverage experimental progress like high-throughput sequencing by taking advantage of recent advances in other disciplines, in particular in information technology. Exploring the interrelationship of structure and function is crucial for understanding life on the molecular level. Yet despite significant progress of experimental methods, the crucial structural characterization of many important proteins and non-coding RNA (ncRNA)- typically preceding any detailed mechanistic exploration of their function- remains challenging. Typically, such work has focused on proteins as “molecular workhorses” of cells, yet recent work has attributed more and more crucial functions to ncRNA as most of eukaryotic genomes does not code for proteins. Our vision is to develop the technological and algorithmic framework for mining these vast amounts of raw sequence data with the goal of predicting experimentally poorly accessible biomolecular structures by tracing residue co-evolution. Going beyond structure prediction, we can also link co-evolutionary patterns to functional questions like antibiotic drug resistance.
    
    Speaker: Dr Alexander Schug (KIT)
    
    Slides
  - 9:40 AM
    
    GPU Computing: Platform, Programming, and Pitfalls 40m Aula
    
    Aula
    
    FTU
    
    GPUs, Graphics Processing Units, offer a large amount of processing power by providing a platform for massively parallel computing. They have the ability to greatly increase the performance of scientific applications on a single workstation computer; and they also power the fastest supercomputers in the world. But leveraging the processing power is not as easy as just running a program on a GPU-enabled computer. The program needs to be ported to and carefully optimized for the GPU architecture. This talk gives an introduction to GPU hardware architectures, programming concepts (CUDA, OpenACC), and touches on the most prevalent pitfalls of working with the technologies.
    
    Speaker: Dr Andreas Herten (FZ Jülich)
    
    Slides
  - 10:20 AM
    
    Coffee Break 20m Aula (FTU)
    
    Aula
    
    FTU
  - 10:40 AM
    
    The computer that could be smarter than us - Cognitive Computing 40m Aula
    
    Aula
    
    FTU
    
    Cognitive Computing is nowadays to be talked of and it is already present in the realm of supercomputing. However, limitations are set by Moore's Law, therefore new technologies and compute approaches are required to make cognitive solutions come true. Todays cognitive solutions presage what will be possible in the future - can future computer be smarter than us?
    
    Speaker: Ingolf Wittmann (IBM Deutschland)
    
    Slides
  - 11:20 AM
    
    HDF5 as a standard file format for synchrotron radiation experiments 40m Aula
    
    Aula
    
    FTU
    
    HDF5 is on the verge of becoming a standard file format at synchrotron radiation facilities. This talk will give an overview on the most recent features added to HDF5 and how the rules, specified by the NeXus standard, can help to improve HDF5's use on data recorded during synchrotron radiation experiments.
    
    Speaker: Dr Eugen Wintersberger (DESY)
    
    Slides
- 12:00 PM → 1:00 PM
  
  Lunch Break 1h Canteen
  
  Canteen
- 1:00 PM → 6:00 PM
  
  Advanced Python Programming for Data Science 5h Aula
  
  Aula
  
  FTU
  
  In recent years, Python has been adopted in many fields, be it scientific, commercial and other. For most new users, Python is easy to learn and allows to quickly write small but powerful scripts. At the same time, it is also feasible to create web services, distributed processing systems, and other complex applications. Beginners have an easy start with Python: little code is required for many tasks as the language has "batteries included". Yet, data driven science often relies on custom workflows and algorithms. Writing clean, efficient, and comprehensive code is vital; perhaps not for the next deadline, but the one after. Luckily, Python provides many features to even manage complex tasks with ease. The course focuses on advanced concepts of Python programming, useful for data science. This covers features of the language, how to use them, and also best practices. We present this as an interactive hands on tutorial, which gives you a feeling for the capabilities of Python. The course targets people familiar with Python or similar languages. You should feel comfortable writing small scripts for solving problems.
  
  Speaker: Dr Max Fischer (KIT)
- 1:00 PM → 6:00 PM
  
  Apache Spark in Scientific Applications 5h Room 155
  
  Room 155
  
  FTU
  
  The workshop Spark in Scientific Applications covers fundamentale development and data analysis techniques using Apache Hadoop and Apache Spark. Beside an introduction into the theoretical background about Map-Reduce- and Bulk-Synchronous-Parallel processing, also the machine learning library MLlib and the graph processing framework GraphX are used. We work on sample data sets from Wikipedia, financial market data, and from a generic data generator. During the tutorial sessions we illustrate the Data Science Workflow and present the right tools for the right task. All practical exercises are well prepared in a pre-configured virtual machine. Participants get access to required data sets on a „one node pseudo-distributed“ cluster with all tools inside. This VM is also a starting point for further experiments after the workshop.
  
  Speaker: Mirko Kämpf (Cloudera)
- 1:00 PM → 6:00 PM
  
  Application development with relational and non-relational databases 5h Room 163
  
  Room 163
  
  FTU
  
  In this workshop, the students will learn how to use relational and non-relational databases to build multi-threaded applications. The focus of the workshop is to teach efficient, safe, and fault-tolerant principles when dealing with high-volume and high-throughput database scenarios. This includes, but is not limited to, systems such as PostgreSQL, Redis or Elasticsearch. A basic understanding of the following things is required: - A programming language (preferably Python or any C-like) - Basic SQL (CREATE, DROP, SELECT, UPDATE, DELETE) - Linux shell scripting (bash or zsh) The course will cover the following three topics: - When to use relational databases, and when not * Relational primer * Non-relational primer * How to design the data model - Using SQL for fun and profit * Query plans and performance analysis * Transactional safety in multi-threaded environments * How to deal with large amounts of sparse metadata * Competetive locking and selection strategies - Building a fault-tolerant database application * Distributed transactions across relational and non-relational databases * SQL injection and forceful breakage * Application-level mitigation for unexpected database issues
  
  Speaker: Dr Mario Lassnig (CERN)
- 1:00 PM → 6:00 PM
  
  GPU 5h Room 156
  
  Room 156
  
  FTU
  
  While the computing community is racing to build tools and libraries to ease the use of heterogeneous parallel computing systems, effective and confident use of these systems will always require knowledge about the low-level programming interfaces in these systems. This workshop is designed to introduce the CUDA programming language, through examples and hands-on exercises so as to enable the user to recognize CUDA friendly algorithms and completely exploit the computing potential of a heterogeneous parallel system.
  
  Speaker: Felice Pantaleo (CERN)
- 1:00 PM → 6:00 PM
  
  Introduction into Go 5h Room 164
  
  Room 164
  
  FTU
  
  # Introduction to Go ## Introduction In this workshop, we will introduce the basics of programming in Go and then work our way up to concurrency programming with this relatively new language. We'll start with the usual "Hello World" program, introduce functions, variables, packages and then interfaces. Then, we will tackle the two main tools at the disposal of the Go programmer (colloquially known as a gopher): the channels and the goroutines. This will be done by implementing a small peer to peer application transmitting text messages over the network. The workshop wraps up with a whirlwind tour of scientific and non-scientific libraries readily available, and prospects/news about the next Go version. ## References - https://golang.org - https://tour.golang.org - https://talks.golang.org People will have to install the Go compiler on their laptop. The instructions to do so for their favorite operating system are detailed at: https://golang.org/doc/install To get a taste of what Go looks like and wet their feet, people can also follow the interactive, browser-based, installation-free tour from: https://tour.golang.org
  
  Speaker: Dr Sebastien Binet (Laboratoire de Physique Corpusculaire de Clermont-Ferrand (LPC))
- 1:00 PM → 6:00 PM
  
  Modern & Idiomatic C++14 (with a perspective on C++17) 5h Room 157
  
  Room 157
  
  FTU
  
  In this workshop we explain not only modern and idiomatic C++14 features such as ownership semantics, Rule of Zero and Expression SFINAE from the ground up. We also show modern software-engineering techniques for safely wrapping C-libraries, how to provide FFI access to your C++ library (e.g. to provide Python, Node.js or Haskell bindings), and how to provide library ABI stability guarantees. The workshop wraps up with an outlook on the C++17 standardization and proposals (such as Concepts, Variant, Optional), and a discussion on how to emulate some of those features in C++14.
  
  Speaker: Daniel Hofmann (Mapbox)
- 8:00 PM → 11:00 PM
  
  School Dinner Leonardo Hotel
  
  Leonardo Hotel
Fri, September 2
- 9:00 AM → 12:00 PM
  Plenary Talks Aula
  
  Aula
  
  FTU
  
  slides
  - 9:00 AM
    
    Mesoscale brain modelling in the Human Brain Project 40m Aula
    
    Aula
    
    FTU
    
    The Human Brain Project (HBP) aims to put in place an ICT-based scientific research infrastructure that will allow researchers to improve our understanding of the human brain through data-driven modelling and whole-brain simulation. Advanced computing technologies enable HBP researchers to study models that were unmanageable until recently. This talks focuses on computational and modelling aspects of mesoscale brain synthesis, and puts them in the context of some of the HBP ICT platforms, namely the Neuroinformatics Platform, the Brain Simulation Platform, and the High Performance Analytics and Computing Platform.
    
    Speaker: Dr Liesbeth Vanherpe (École polytechnique fédérale de Lausanne)
  - 9:40 AM
    
    Trends in HPC System Architectures: Towards 'The Machine' and Beyond 40m Aula
    
    Aula
    
    FTU
    
    The talk will address trends in system architecture for HPC and will include related aspects of Big Data and IoT. A specific focus will be on innovative components like next generation memory interconnects, non-volatile memory and silicon photonics that play a key role in future system designs. HPE's 'The Machine' will be used to bring those components into the context of an actual system implementation. Related options and challenges at the level of system software, middleware and programming paradigms will also be addressed.
    
    Speaker: Dr Frank Baetke (Hewlett Packard Enterprise)
    
    Slides
  - 10:20 AM
    
    Coffee Break 20m Aula (FTU)
    
    Aula
    
    FTU
  - 10:40 AM
    
    Radiomics 40m Aula
    
    Aula
    
    FTU
    
    Radiologic images uniquely represent the spatial fingerprints of a progressing disease over time. “Radiomics” coins the emerging endeavor to systematically extract, mine and leverage this rich information in a personalized medicine approach. We establish and study comprehensive imaging phenotypes reflecting multiple time-points and modalities that can be directly linked to other information sources such as clinical, biological, genomic or proteomic parameters. This challenge requires novel developments at the core of computer science as well as close collaboration with research units from radiology, medical physics and oncology to enable successful translation to the clinic. Research topics include automated image understanding for anatomical structure detection and lesion segmentation as well as derivation of quantitative imaging biomarkers. Our special interest is in investigating the use of data-driven paradigms such as deep and weak learning strategies for building robust models and tapping the full potential of the information encoded in the images.
    
    Speaker: Dr Klaus Maier-Hein (Deutsches Krebsforschungszentrum Heidelberg)
  - 11:20 AM
    
    Conclusions 40m Aula
    
    Aula
    
    FTU
    
    Speaker: Dr Manuel Giffels (KIT)
    
    Slides

Choose timezone

GridKa School 2016 - Data Science on Modern Architectures

FTU

Workshops

Plenary talks

Social Events

Foyer

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula, FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula, FTU

Aula

FTU

Aula

FTU

Canteen

Room 157

FTU

Room 156

FTU

Room 163

FTU

Room 155

FTU

Aula

FTU

SCC

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Canteen

Room 163

FTU

Room 157

FTU

Room 156

FTU

Room 155

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula

FTU

Aula