GridKa School 2016 - Data Science on Modern Architectures

Europe/Berlin
FTU

FTU

Description

The International GridKa School 2016 is one of the leading summer schools for advanced computing techniques in Europe. The school provides a forum for scientists and technology leaders, experts, and novices to facilitate knowledge sharing and information exchange. The target audience is different groups like graduate and PhD students, advanced users as well as IT administrators. GridKa School is hosted by Steinbuch Centre for Computing (SCC) of Karlsruhe Institute of Technology (KIT). It is organized by KIT and the HGF Alliance "Physics at the Terascale".

Workshops

 

Plenary talks

 

Social Events

Hands-on sessions and workshops give participants an excellent and unique chance to gain practical experience on cutting edge technologies and tools.        Plenary talks presented by experts cover theoretical aspects of school topics and focus on innovative features of data science on modern architectures.        Two social events are important parts of the school to give participants the opportunity to get in touch with interesting people in a relaxed atmosphere to improve their networking and to have fun.

 

Participants
  • Aleksander Paravac
  • Alexander Schug
  • Alexandr Mikula
  • Alfred Franz
  • Andreas Heiss
  • Andreas Herten
  • Andreas Petzold
  • Andrew Lahiff
  • Arif Bayirli
  • Asfaw Yohannes
  • Benedikt Hegner
  • Benedikt Riehm
  • Brendan Bouffler
  • Byungyun KONG
  • Christian Haack
  • Christian Petters
  • Christoph Heidecker
  • Christoph Renner
  • Christoph Weinsheimer
  • Christoph-Erdmann Pfeiler
  • Christopher Jung
  • Christopher Stihl
  • CHRYSOVALANTIS MANTAFOUNIS
  • Clemens Hentschke
  • Daniel Hofmann
  • Daniel Schmid
  • Daniela Schaefer
  • Denise Mueller
  • Diana Gudu
  • Dimitri Nilsen
  • Dirk Jacobsen
  • Donghee Kang
  • Doris Wochele
  • Eileen Kühn
  • Elnaz Azmi
  • Elvin Sindrilaru
  • Emanuel Pfeffer
  • Eugen Wintersberger
  • Fabio Baruffa
  • Felice Pantaleo
  • Frank Baetke
  • Frank Fischer
  • Frank Koester
  • Gabriel Zachmann
  • Giovanni Siragusa
  • Graeme Stewart
  • Gregor Köhler
  • Gregor Mittag
  • Günther Erli
  • Hannes Hartenstein
  • HEEJUNE HAN
  • Holger Kluck
  • Igor Katkov
  • Ingolf Wittmann
  • Ingrid Schäffner
  • Iurii Sorokin
  • John Wood
  • Jose Castro Leon
  • Jurry de la Mar
  • Jörg Meyer
  • Jürgen A. Krebs
  • Jürgen Feiler
  • Karim El Morabit
  • Kevin Floeh
  • Klaus Maier-Hein
  • Leif Rädel
  • Lena Wiese
  • Liesbeth Vanherpe
  • Lorenzo Moneta
  • Malika Touil
  • Manfred Größer
  • Manuel Giffels
  • Manuela Kuhn
  • Marco Berghoff
  • Marco Kleesiek
  • Marco Nolden
  • Marcus Hardt
  • Marcus Schmitt
  • Marcus Strobl
  • Marek Szuba
  • Mario Lassnig
  • Matthias Schnepf
  • Max Fischer
  • Mehmet Soysal
  • Melanie Ernst
  • Michael Bontenackels
  • Michael Hufschmidt
  • Michał Orzechowski
  • Mirko Kämpf
  • Moritz Gelb
  • Natascha Krammer
  • Nathalie Rauschmayr
  • Nico Böhr
  • Nikolaus Trost
  • Nils Braun
  • Oleg Dulov
  • Olga Kambeitz
  • Oliver Freyermuth
  • Paul Jäger
  • Pavel Weber
  • Peter Braesicke
  • Peter Krauß
  • Preslav Konstantinov
  • Ralf Floca
  • Raphael Friese
  • Rob Blake
  • Samuel Ambroj Pérez
  • Sebastien Binet
  • Sergio Tafula
  • Simon Kohl
  • Stanisław Jankowski
  • Stefan Geißelsöder
  • Sven Schoo
  • Sven Sternberger
  • Sébastien Gadrat
  • Sławomir Zdanowski
  • Tamás Szép
  • Thomas Latzko
  • Tobias Kurze
  • Tonn Rüter
  • Tunner Chimhondo
  • Ugur Cayoglu
  • Ulrich Hartenstein
  • Ulrike Schnoor
  • Valentin Kozlov
  • Vimal Chander M
  • Will Breaden Madden
  • William Ma
  • Xavier Valls
  • Yousri Mhedheb
    • Registration Foyer

      Foyer

      FTU

    • Plenary Talks Aula

      Aula

      FTU

      slides
      • 1
        Welcome to KIT Aula

        Aula

        FTU

        Speaker: Prof. Hannes Hartenstein (KIT)
        Slides
      • 2
        Welcome to GridKa School 2016 Aula

        Aula

        FTU

        Speaker: Dr Manuel Giffels (KIT)
        Slides
      • 3
        Data Science - Past, Present and Future Aula

        Aula

        FTU

        This presentation will give an overview of the rapid development over the past 15 years of what is often called the “Fourth Paradigm” leading to Open Science and Open Innovation. A number of examples from research infrastructures will be used to illustrate the current situation and the role of global organisations like the Research Data Alliance have in bringing about the revolution in Data Science which is fundamental to this approach. As the internet of things develops some ideas will be shared about the possible future including the impact on the way society copes with the implications.
        Speaker: Prof. John Wood (RDA Council Co-Chair)
        Slides
      • 3:30 PM
        Coffee Break Aula, FTU

        Aula, FTU

      • 4
        ROOT as a Service - Jupyter Notebook Aula

        Aula

        FTU

        The Jupyter Notebook is a narrative that combines data visualisation, text, code and multimedia material, all in the same web application. It can be used for instance for data analysis, simulations, statistical inference, machine learning or teaching. Moreover Notebooks encourage suitably documented scientific code, a desirable feature when reproducibility and preservation of analyses is sought after. This lecture introduces the basic concepts behind Jupyter Notebooks and reviews some of its possible usages in science: examples of sophisticated statistical data analysis, multivariate techniques and data visualisation will be discussed. Driven by a pragmatic attitude, the theoretical concepts are complemented by concrete applications, involving cutting edge production tools coming also from the Python analysis ecosystem and High Energy Physics community.
        Speaker: Dr Lorenzo Moneta (CERN)
        Slides
      • 5
        Data Challenges in Photon Science Aula

        Aula

        FTU

        This presentation will give an overview about the data life cycle in photon science and the resulting challenges on data handling. The discussed topics include data taking, analysis, storage and achival. The technical realization is illustrated on an example of currently developed infrastructures for synchrotrons and free electron lasers.
        Speaker: Manuela Kuhn (DESY)
        Slides
      • 6
        Data Management in NoSQL Databases Aula

        Aula

        FTU

        The relational data model - where data are stored in tables and hence structured according to some fixed set of attributes (that is, table columns) - has been a success for several decades. Furthermore, SQL is a standardized and widely used query and management language for relational databases. A transformation of commonly occurring data into the relational table format is however not always convenient. On the contrary, storing arbitrary documents, objects in programming language, XML data and the like in relational databases imposes a huge overhead. Moreover, relational databases are geared towards frequent queries on a stable set of data with infrequent updates. Novel requirements for database management systems lead to an emergence of several alternatives to relational systems, so that data can be stored in other structures with a flexible update and query behavior and distributed on multiple servers. Under the slogan NOSQL (in the sense of Not Only SQL) some systems have come up that concentrate on versatile use cases while diverging from the relational data model. This talk surveys some of these NOSQL technologies that are employed in Cloud Computing or in social networks and hence will become gradually more significant. The talk will cover graph databases, XML databases, key-value stores and column-family stores.
        Speaker: Dr Lena Wiese (Georg-August-Universität Göttingen)
        Slides
    • Plenary Talks
      slides
      • 7
        Technologies to build hybrid clouds on public-private infrastructures Aula

        Aula

        FTU

        The Helix Nebula initiative continues to expand with research organisations, providers and services. The hybrid cloud model deployed by Helix Nebula has grown to become a viable approach for provisioning ICT services for research communities from both public and commercial service providers (http://dx.doi.org/10.5281/zenodo.16001). The relevance of this approach for all those communities facing societal challenges in explained in a recent EIROforum publication (http://dx.doi.org/10.5281/zenodo.34264). The presentation will outline how a common platform for data intensive services that builds upon existing public funded e-infrastructures and commercial cloud services is achieved to promote open science. The high-level architecture, key technologies as well as the role of standards is described.
        Speaker: Dr Jurry De la Mar (T-Systems)
        Slides
      • 8
        Docker and Linux Containers Aula

        Aula

        FTU

        Linux containers (LXC) is a technology that provides operating system-level virtualisation not via a virtual machines but rather by using a single kernel to run multiple instances on the same OS. Linux namespaces and control groups (cgroups) represent the foundation on which LXC are built. Containers are fast to deploy, they introduce no overhead or indirection as in the case of traditional virtual machines and also have the added design benefit of ensuring complete isolation between processes. Containers are great for running multiple instances of the same service in parallel either as part of a scaling out strategy or just for testing purposes. Docker is built around Linux containers and offers an intuitive way of managing them by abstracting and automating some of the configuration details. Besides being an open-source project, Docker has enabled the development of an entire "ecosystem" of tools and products targeting container technology.
        Speaker: Elvin Sindrilaru (CERN)
        Slides
      • 10:20 AM
        Coffee Break Aula, FTU

        Aula, FTU

      • 9
        Hitachi Data Systems - Digital Transformation within existing Big Data Environments Aula

        Aula

        FTU

        Hardware and beyond In a world where new hypes and trends flooding the CIO's on a daily base it's hard to keep up with innovations and staying on a tight budget. Extract / Transfer and Load has been seen as the solution for years. We show a way starting different and solving existing issues without any vendor lock in.
        Speaker: Krebs Jürgen (Hitachi)
        Slides
      • 10
        Openstack @ CERN Aula

        Aula

        FTU

        OpenStack is an open-source cloud computing platform for public and private clouds. OpenStack software controls large pools of compute, storage, and networking resources throughout a datacentre, managed through a dashboard or via the API. OpenStack works with popular enterprise and open source technologies making it ideal for heterogeneous infrastructure. The presentation will outline our journey to build an OpenStack based private cloud. This talk will cover our experiences in the deployment, management and integration of a large scale cloud at CERN.
        Speaker: Jose Castro Leon (CERN)
        Slides
    • 12:00 PM
      Lunch Canteen

      Canteen

    • 11
      Docker Tutorial Room 157

      Room 157

      FTU

      The Docker tutorial will walk you through the basic steps of setting up a Docker environment on your machine. There will be a series of exercises that will detail the various concepts presented during the plenary talk which are critical that you understand for the later part of the tutorial. The final goal of the tutorial is to build and deploy a couple of containers that replicate the usual analysis workflow in High Enery Physics: you will have a container running a XRootD server providing the storage for the data and a different container that runs the ROOT framework where you will do your analysis. The tutorial will discuss into depth the concepts of port forwarding, volumes and resource management in the context of containers with a focus on understanding the advantages of containers over traditional virtual machines. In order to benefit to the maximum of the Docker tutorial part, there are some pre-requisites one needs to take into consideration. First of all, you should be comfortable working with the Linux terminal, installing packages over the command line, using the ssh client to connect to a remote machine and last but not least, editing files using one of the common editors in Linux: emacs, vi, nano etc.
      Speaker: Elvin Sindrilaru (CERN)
    • 12
      Introduction to OpenStack Room 156

      Room 156

      FTU

      Speakers: Peter Krauss (KIT), Tobias Kurze (KIT)
    • 13
      Jupyter Notebooks for Science (ROOT as a Service) Room 163

      Room 163

      FTU

      The objective of the exercises is to give to show the participants how to be even more productive in they everyday tasks thanks to Jupyter notebooks. During this hands on session the concepts treated during the front lesson are seen in action. At the beginning the opportunities given by the notebooks and the Jupyter web interface are reviewed. Elements of the several keyword shortcuts and web user interface management are tried out. The usage of markdown cells, code cells, the way to manage the output and notebooks' conversion to various formats such as HTML or pdf is illustrated. Once the basics are addressed, we focus on Python and C++ notebooks, starting from selected examples coming from High Energy Physics such as CMS Opendata or Large Hadron Collider machine studies. Libraries like ROOT, Matplotlib and Pandas are used. Particular attention is dedicated to machine learning activities. Finally the interplay of Python and C++ made possible by the ROOT Python bindings, PyROOT, is illustrated by examples.
      Speaker: Dr Lorenzo Moneta (CERN)
    • 14
      Puppet I Room 155

      Room 155

      FTU

      Puppet is today the leading configuration management system. It allows to represent and to control the complete infrastructure of complex IT environments. In this course we will learn what Puppet is and how to use it. We will start to write simple examples and understand how the Puppet workflow could look like. We will also have a look in the surrounding ecosystem git, foreman and hiera. And will show how continous integration could help us to develop our sites.
      Speakers: Dimitri Nilsen (KIT), Dr Pavel Weber (KIT), Sven Sternberger (DESY)
    • 15
      Using R to improve data analyses Python workflows Aula

      Aula

      FTU

      The choice between R and Python is often considered when selecting the best tool to use for data science workflows. Many people often prefer Python over R, since it offers a lot of flexibility and is commonly used in every-day programming. In contrast, R has a steep learning curve. However it offers a huge amount of specialised statistics libraries that are not available in Python so far. Furthermore it offers leading-edge features for visualisation. Combining R and Python in your workflows allows you to benefit from both. Throughout the tutorial, you will learn how to benefit from both tools. To get you running, we introduce several possibilities how to utilise R from Python. The tutorial itself is hands-on, applying concepts from every-day workflows. The tutorial targets participants who are familiar with using Python for their workflows. Experience with R is not required, though helpful.
      Speaker: Eileen Kühn (KIT)
    • Tarte Flambee Evening SCC

      SCC

    • Plenary Talks Aula

      Aula

      FTU

      slides
      • 16
        Building HPC Clusters as Code in the [Almost] Infinite Cloud Aula

        Aula

        FTU

        Every day, HPC clusters help scientists make breakthroughs, such as proving the existence of gravitational waves, screening new compounds for new drugs and designing better headlights for cars. No industry is untouched by HPC, yet owning HPC clusters is out of reach for most organizations due to the upfront hardware and ongoing operational costs. Now with the cloud, not owning an HPC cluster can be one of the most productive ways to compute everything from the fluid dynamics of a milk bottle to the evolution of the universe. And with the availability of free Public Data sets in Amazon S3, even earth observation data and cancer genome data are easily accessible for use in the cloud. Run 1 cluster for 10,000 hours or 10,000 clusters for 1 hour anytime, from anywhere---in the cloud. The speed of innovation is only bound by your imagination, not your budget. This talk will present why people are using Amazon for HPC and Scientific applications, how to build clusters on the fly, Amazon's on-demand SPOT market pricing. Lastly, it will provide real world examples from customers just like you, who’ve broken new ground because of the agility the cloud offers.
        Speaker: Brendan Bouffler (Amazon.com, Inc.)
        Slides
      • 17
        Introduction to HTCondor Aula

        Aula

        FTU

        Originally developed over 20 years ago as a means of making use of unused computing resources on desktops, today HTCondor plays an important role in providing high throughput computing for CERN’s Large Hadron Collider. For example, it is used as a batch system at an increasing number of sites, used as a grid computing element, used to provision both grid and cloud resources, and used as an overlay batch system running on top of multiple grid sites around the world at very large scales. This talk will give an introduction to HTCondor, its architecture and illustrate its flexibility by describing a number of varied use cases.
        Speaker: Dr Andrew Lahiff (Rutherford Appleton Laboratory)
        Slides
      • 10:20 AM
        Coffee Break Aula (FTU)

        Aula

        FTU

      • 18
        HEP Software Foundation Aula

        Aula

        FTU

        Speaker: Dr Benedikt Hegner (CERN)
        Slides
      • 19
        Scientific Computing in Climate Science Aula

        Aula

        FTU

        The presentation will describe the state-of-the-art in composition-climate modelling of the atmosphere. Which equations do we use and how are they discretised? Which phenomenons can we describe and how and what can we learn about the atmosphere by characterising a models sensitivities? Issues of scalability and data access will be discussed, because comprehensive numerical experiments are required to tackle questions that are important for society, in particular global change.
        Speaker: Prof. Peter Braesicke (KIT)
        Slides
    • 12:00 PM
      Lunch Break Canteen

      Canteen

    • 20
      CEPH Tutorial Room 163

      Room 163

      FTU

      Ceph is an open-source, software-defined distributed storage system that strives to achieve scalability and reliability through an innovative decentralised design. Distributed file systems nowadays face multiple challenges: scaling to peta-byte capacity and providing high performance, while protecting against failures. Moreover, file systems should be able to adapt to dynamic distributed workloads to provide the best performance. Ceph tries to tackle these challenges with a completely decentralised architecture that has no single point of failure. Reliability is achieved through distributed data placement and replication. Ceph's dynamic metadata partitioning feature helps deal with dynamic workloads. This session is an introduction to Ceph and it will cover theoretical background on Ceph's architecture, as well as hands-on exercises, such as installation and configuration of a Ceph cluster, basic usage and monitoring, and a dive into Ceph's secret to extreme scalability: the CRUSH algorithm. After completing this session, you should be able to understand and discuss Ceph concepts, and deploy and manage a Ceph Storage Cluster. Basic knowledge of Linux and storage concepts is required.
      Speaker: Diana Gudu (KIT)
    • 21
      Concurrent Programming in C++ Room 157

      Room 157

      FTU

      In this course we will introduce how to program for concurrency in C++, taking advantage of modern CPUs ability to run multi-threaded programs on different CPU cores. Firstly, we will explore the new concurrency features of C++14 itself, which will also serve as a general introduction to multi-threaded programming. Students will learn the basics of asynchronous execution, thread spawning, management and synchronisation. Some elementary considerations about deadlocks and data races will be introduced, which will illustrate the common problems that can arise when programming with multiple threads. After this the Threaded Building Block template library will be introduced. We shall see how the features of this library allow programers to exploit multi-threading at a higher level, not needing to worry about so many of the details of thread management. Students should be familiar with C++ and the standard template library. Some familiarity with makefiles and/or CMake would be useful.
      Speaker: Dr Graeme Stewart (University of Glasgow)
    • 22
      Performance, profiling and debugging Room 156

      Room 156

      FTU

      Nowadays, multi- and manycore CPUs provide many different performance dimensions. In order to profit from these, it needs to be understood how an application interacts with the hardware. This is done through profiling, which helps to determine resource requirements and to identify performance bottlenecks. The session will discuss the different perfomance dimensions of modern CPUs, how to perform benchmarking and high-/low-level application profiling in Linux. Commonly used profiling and debugging utilities will be presented. The session requires a basic understanding of Linux and CPU architecture.
      Speaker: Dr Nathalie Rauschmayr (CERN)
    • 23
      Puppet II Room 155

      Room 155

      FTU

      Puppet is today the leading configuration management system. It allows to represent and to control the complete infrastructure of complex IT environments. In this course we will learn what Puppet is and how to use it. We will start to write simple examples and understand how the Puppet workflow could look like. We will also have a look in the surrounding ecosystem git, foreman and hiera. And will show how continous integration could help us to develop our sites.
      Speakers: Dimitri Nilsen (KIT), Dr Pavel Weber (KIT), Sven Sternberger (DESY)
    • 24
      Scientific Computing on Amazon Web Services Aula

      Aula

      FTU

      A survey will be send around to all participants shortly before the GridKa School to asked for a specific sub-topic to cover. The following topics are available:
      1. Introduction to AWS (Basics for all services)
      2. Compute:
        • Amazon Elastic Compute (EC2)
        • EC2 Container Services
        • Lambda - Run Code in Response to Events
      3. Storage
        • Scalable Storage in the Cloud (S3)
        • CloudFront
        • Glacier
        • Elastic File System
      4. Database
        • RDS (Managed Relational Database service)
        • DynamoDB (Managed NoSQL database services)
        • Redshift (Fast Simple Cost-Effiective Data)
      5. Management Tools
        • CloudWatch (Monitor Resources)
        • CloudFormation (Create and manage Resources)
        • CloudTrail (track user activities)
        • Config (Track resource inventory)
        • OpsWorks (Automatic operations)
        • Service Catalog (Create and use Standardized services)
      6. Security & Identity
        • Identity and Access Management
        • Certificate Manager
      7. Analytics
        • Elastic Map Reduce (EMR)
        • Data Pipeline (Data driven workflows)
        • Elastic search searvice
        • Kinesis (Work with streaming data)
        • Machine learning
      8. Internet of Things - Connect Devices to the cloud
      9. Mobile Services
        • Device Farm : Test Android, iOS and Web Apps
        • Mobile analytics (Collect, View and Export App Analytics)
        • SNS Simple notification service
      10. Application services
        • API Gateway (Build deploy and manage gateways)
        • AppStream (low latency app streaming)
        • SQS (Message Queue Service)
        • SWF (Workflow Service for Coordinating Application components)
      Speakers: Brendan Bouffler (Amazon.com, Inc), Christian Petters (Amazon Web Services GmbH)
    • 25
      Evening Lecture: "Automated and Connected Driving from a User-Centric Viewpoint" Aula

      Aula

      FTU

      Automated and connected driving can be seen as a paradigm shift in the automotive domain which will change the today’s role of the human driver substantially. More and more parts of the driving task are to be taken over by technical components for different scenarios. The application areas reach from fully automated parking up to automated driving on a highway with different involvements of the driver in monitoring tasks (SAE level 2 to 5). Thus, it is essential to understand how this role change will influence the human driver and how we can support him by designing an easy to understand, safe and comfortable interaction with the automated vehicle. From Human Factors research we know that it is crucial to ensure that the driver has a correct mental model about the automation levels and its current status to avoid mode confusion and resulting errors in operation. In addition, transitions of control between the driver and the automated vehicle need a careful design that is adjusted to the current traffic situation and the driver status. For example, mobile devices that the driver uses during the automated driving can be integrated smoothly in the overall interaction design concept to support transitions. The talk will give an overview on selected research topics in the area of user-centric development of automated and connected vehicles at DLR. It will underline why it is important to put the human into focus when designing a new, exciting technology such as automated and connected vehicles.
      Speaker: Prof. Frank Köster (Deutsches Zentrum für Luft- und Raumfahrt)
    • Plenary Talks Aula

      Aula

      FTU

      slides
      • 26
        Big Data Analytics in Biology: Biomolecular Structure Prediction and Beyond by Tracing Residue Co-Evolution Aula

        Aula

        FTU

        One grand challenge of life sciences in the coming years is to fully leverage experimental progress like high-throughput sequencing by taking advantage of recent advances in other disciplines, in particular in information technology. Exploring the interrelationship of structure and function is crucial for understanding life on the molecular level. Yet despite significant progress of experimental methods, the crucial structural characterization of many important proteins and non-coding RNA (ncRNA)- typically preceding any detailed mechanistic exploration of their function- remains challenging. Typically, such work has focused on proteins as “molecular workhorses” of cells, yet recent work has attributed more and more crucial functions to ncRNA as most of eukaryotic genomes does not code for proteins. Our vision is to develop the technological and algorithmic framework for mining these vast amounts of raw sequence data with the goal of predicting experimentally poorly accessible biomolecular structures by tracing residue co-evolution. Going beyond structure prediction, we can also link co-evolutionary patterns to functional questions like antibiotic drug resistance.
        Speaker: Dr Alexander Schug (KIT)
        Slides
      • 27
        GPU Computing: Platform, Programming, and Pitfalls Aula

        Aula

        FTU

        GPUs, Graphics Processing Units, offer a large amount of processing power by providing a platform for massively parallel computing. They have the ability to greatly increase the performance of scientific applications on a single workstation computer; and they also power the fastest supercomputers in the world. But leveraging the processing power is not as easy as just running a program on a GPU-enabled computer. The program needs to be ported to and carefully optimized for the GPU architecture. This talk gives an introduction to GPU hardware architectures, programming concepts (CUDA, OpenACC), and touches on the most prevalent pitfalls of working with the technologies.
        Speaker: Dr Andreas Herten (FZ Jülich)
        Slides
      • 10:20 AM
        Coffee Break Aula (FTU)

        Aula

        FTU

      • 28
        The computer that could be smarter than us - Cognitive Computing Aula

        Aula

        FTU

        Cognitive Computing is nowadays to be talked of and it is already present in the realm of supercomputing. However, limitations are set by Moore's Law, therefore new technologies and compute approaches are required to make cognitive solutions come true. Todays cognitive solutions presage what will be possible in the future - can future computer be smarter than us?
        Speaker: Ingolf Wittmann (IBM Deutschland)
        Slides
      • 29
        HDF5 as a standard file format for synchrotron radiation experiments Aula

        Aula

        FTU

        HDF5 is on the verge of becoming a standard file format at synchrotron radiation facilities. This talk will give an overview on the most recent features added to HDF5 and how the rules, specified by the NeXus standard, can help to improve HDF5's use on data recorded during synchrotron radiation experiments.
        Speaker: Dr Eugen Wintersberger (DESY)
        Slides
    • 12:00 PM
      Lunch Break Canteen

      Canteen

    • 30
      Advanced Python Programming for Data Science Aula

      Aula

      FTU

      In recent years, Python has been adopted in many fields, be it scientific, commercial and other. For most new users, Python is easy to learn and allows to quickly write small but powerful scripts. At the same time, it is also feasible to create web services, distributed processing systems, and other complex applications. Beginners have an easy start with Python: little code is required for many tasks as the language has "batteries included". Yet, data driven science often relies on custom workflows and algorithms. Writing clean, efficient, and comprehensive code is vital; perhaps not for the next deadline, but the one after. Luckily, Python provides many features to even manage complex tasks with ease. The course focuses on advanced concepts of Python programming, useful for data science. This covers features of the language, how to use them, and also best practices. We present this as an interactive hands on tutorial, which gives you a feeling for the capabilities of Python. The course targets people familiar with Python or similar languages. You should feel comfortable writing small scripts for solving problems.
      Speaker: Dr Max Fischer (KIT)
    • 31
      Apache Spark in Scientific Applications Room 155

      Room 155

      FTU

      The workshop Spark in Scientific Applications covers fundamentale development and data analysis techniques using Apache Hadoop and Apache Spark. Beside an introduction into the theoretical background about Map-Reduce- and Bulk-Synchronous-Parallel processing, also the machine learning library MLlib and the graph processing framework GraphX are used. We work on sample data sets from Wikipedia, financial market data, and from a generic data generator. During the tutorial sessions we illustrate the Data Science Workflow and present the right tools for the right task. All practical exercises are well prepared in a pre-configured virtual machine. Participants get access to required data sets on a „one node pseudo-distributed“ cluster with all tools inside. This VM is also a starting point for further experiments after the workshop.
      Speaker: Mirko Kämpf (Cloudera)
    • 32
      Application development with relational and non-relational databases Room 163

      Room 163

      FTU

      In this workshop, the students will learn how to use relational and non-relational databases to build multi-threaded applications. The focus of the workshop is to teach efficient, safe, and fault-tolerant principles when dealing with high-volume and high-throughput database scenarios. This includes, but is not limited to, systems such as PostgreSQL, Redis or Elasticsearch. A basic understanding of the following things is required: - A programming language (preferably Python or any C-like) - Basic SQL (CREATE, DROP, SELECT, UPDATE, DELETE) - Linux shell scripting (bash or zsh) The course will cover the following three topics: - When to use relational databases, and when not * Relational primer * Non-relational primer * How to design the data model - Using SQL for fun and profit * Query plans and performance analysis * Transactional safety in multi-threaded environments * How to deal with large amounts of sparse metadata * Competetive locking and selection strategies - Building a fault-tolerant database application * Distributed transactions across relational and non-relational databases * SQL injection and forceful breakage * Application-level mitigation for unexpected database issues
      Speaker: Dr Mario Lassnig (CERN)
    • 33
      GPU Room 156

      Room 156

      FTU

      While the computing community is racing to build tools and libraries to ease the use of heterogeneous parallel computing systems, effective and confident use of these systems will always require knowledge about the low-level programming interfaces in these systems. This workshop is designed to introduce the CUDA programming language, through examples and hands-on exercises so as to enable the user to recognize CUDA friendly algorithms and completely exploit the computing potential of a heterogeneous parallel system.
      Speaker: Felice Pantaleo (CERN)
    • 34
      Introduction into Go Room 164

      Room 164

      FTU

      # Introduction to Go ## Introduction In this workshop, we will introduce the basics of programming in Go and then work our way up to concurrency programming with this relatively new language. We'll start with the usual "Hello World" program, introduce functions, variables, packages and then interfaces. Then, we will tackle the two main tools at the disposal of the Go programmer (colloquially known as a gopher): the channels and the goroutines. This will be done by implementing a small peer to peer application transmitting text messages over the network. The workshop wraps up with a whirlwind tour of scientific and non-scientific libraries readily available, and prospects/news about the next Go version. ## References - https://golang.org - https://tour.golang.org - https://talks.golang.org People will have to install the Go compiler on their laptop. The instructions to do so for their favorite operating system are detailed at: https://golang.org/doc/install To get a taste of what Go looks like and wet their feet, people can also follow the interactive, browser-based, installation-free tour from: https://tour.golang.org
      Speaker: Dr Sebastien Binet (Laboratoire de Physique Corpusculaire de Clermont-Ferrand (LPC))
    • 35
      Modern & Idiomatic C++14 (with a perspective on C++17) Room 157

      Room 157

      FTU

      In this workshop we explain not only modern and idiomatic C++14 features such as ownership semantics, Rule of Zero and Expression SFINAE from the ground up. We also show modern software-engineering techniques for safely wrapping C-libraries, how to provide FFI access to your C++ library (e.g. to provide Python, Node.js or Haskell bindings), and how to provide library ABI stability guarantees. The workshop wraps up with an outlook on the C++17 standardization and proposals (such as Concepts, Variant, Optional), and a discussion on how to emulate some of those features in C++14.
      Speaker: Daniel Hofmann (Mapbox)
    • School Dinner Leonardo Hotel

      Leonardo Hotel

    • Plenary Talks Aula

      Aula

      FTU

      slides
      • 36
        Mesoscale brain modelling in the Human Brain Project Aula

        Aula

        FTU

        The Human Brain Project (HBP) aims to put in place an ICT-based scientific research infrastructure that will allow researchers to improve our understanding of the human brain through data-driven modelling and whole-brain simulation. Advanced computing technologies enable HBP researchers to study models that were unmanageable until recently. This talks focuses on computational and modelling aspects of mesoscale brain synthesis, and puts them in the context of some of the HBP ICT platforms, namely the Neuroinformatics Platform, the Brain Simulation Platform, and the High Performance Analytics and Computing Platform.
        Speaker: Dr Liesbeth Vanherpe (École polytechnique fédérale de Lausanne)
      • 10:20 AM
        Coffee Break Aula (FTU)

        Aula

        FTU

      • 38
        Radiomics Aula

        Aula

        FTU

        Radiologic images uniquely represent the spatial fingerprints of a progressing disease over time. “Radiomics” coins the emerging endeavor to systematically extract, mine and leverage this rich information in a personalized medicine approach. We establish and study comprehensive imaging phenotypes reflecting multiple time-points and modalities that can be directly linked to other information sources such as clinical, biological, genomic or proteomic parameters. This challenge requires novel developments at the core of computer science as well as close collaboration with research units from radiology, medical physics and oncology to enable successful translation to the clinic. Research topics include automated image understanding for anatomical structure detection and lesion segmentation as well as derivation of quantitative imaging biomarkers. Our special interest is in investigating the use of data-driven paradigms such as deep and weak learning strategies for building robust models and tapping the full potential of the information encoded in the images.
        Speaker: Dr Klaus Maier-Hein (Deutsches Krebsforschungszentrum Heidelberg)
      • 39
        Conclusions Aula

        Aula

        FTU

        Speaker: Dr Manuel Giffels (KIT)
        Slides