GridKa School 2015: Big Data, Virtualization and Modern Programming

Europe/Berlin
Description

The International GridKa School "Big Data, Cloud Computing and Modern Programming" is one of the leading summer schools for advanced computing techniques in Europe. The school provides a forum for scientists and technology leaders, experts and novices to facilitate knowledge sharing and information exchange.The target audience are different groups like graduate and PhD students, advanced users as well as IT administrators. GridKa School is hosted by Steinbuch Centre for Computing (SCC) of Karlsruhe Institute of Technology (KIT). It is organized by KIT and the HGF Alliance "Physics at the Terascale".

Workshops

 

Plenary talks

 

Social Events

The hands-on sessions and workshops give the participants the excellent and unique chance to gain the real practical experience on the cutting edge technologies and tools.        The plenary talks presented by the experts cover the theoretical aspects of the topics discussed at school and focus on the innovative features of the big data and cloud technologies.        Two social events are important parts of the school, which provide the participants with opportunity in the warm atmosphere get in touch with interesting people, improve the networking and have fun.

 

Please try out the free conference4me conference app for

Android                    iOS                    Amazon                    Windows Phone
After installing the app, you can download our school program and easily organize your agenda.
You can download also the schedule [event view , detailed agenda] as iCalendar [.ics] to import into your favourite calendar program.

 

 

GridKa Logo SCC Logo Terascale Logo IBM Logo KSETA Logo
SAP Logo Amazon AWS Logo LSDMA Logo Inovex Logo Logo UNICORE
  Logo Hitachi Data Systems dCache Logo Cloudera Logo
Participants
  • Achim Gütlein
  • Achim Streit
  • Aleksander Paravac
  • Alessandro Costantini
  • Alexander Heck
  • Alexandr Mikula
  • Alexis Descroix
  • AMJAD KOTOBI
  • Andreas Heiss
  • Andreas Weidemann
  • Andrei Kobitski
  • Arsen Hayrapetyan
  • Bas Wegh
  • Ben Jones
  • Benedikt von St. Vieth
  • Benjamin Bertram
  • Benjamin Ertl
  • Björn-Martin Sinnhuber
  • Christian Amstutz
  • Christoph König
  • Christopher Jung
  • Christopher Schmitt
  • Clemens Düpmeier
  • Cristina Manzano
  • Damien Lecarpentier
  • Danah Tonne
  • Daniel Lee
  • David Kelsey
  • David Seldner
  • Diana Gudu
  • Dmitry Nilsen
  • Donghee Kang
  • Doris Ressmann
  • Eduardo Santamaria- Navarro
  • Eileen Kühn
  • Elmar Jakobs
  • Elvin Sindrilaru
  • Engelbert Quack
  • Engin Eren
  • Evelina Buttitta
  • Fabian Rigoll
  • Fabio Baruffa
  • Federica Sozzi
  • Felice Pantaleo
  • Felix Metzner
  • Felix Socher
  • Florian Prill
  • Friedrich Hönig
  • Gabrielle HUGO
  • Georg Fleig
  • Graeme Stewart
  • Gábor Bíró
  • Günter Quast
  • Hendrik Leddin
  • Holger Kluck
  • Ignaz Reicht
  • Ilyeon Yeo
  • Ingrid Schäffner
  • Jan Stillings
  • Jin kim
  • Jochen Gemmler
  • Johannes M. Scheuermann
  • Johannes Skarka
  • Johannes Stegmaier
  • John Kennedy
  • Juergen Krebs
  • Jörg Meyer
  • Jörn Hoffmann
  • Kajorn Pathomkeerati
  • Klaus Manny
  • Konrad Klimaszewski
  • Krzysztof Nawrocki
  • Liesbeth Vanherpe
  • Lorenzo Galli
  • Lukas Zimmer
  • Manfred Größer
  • Manuel Giffels
  • Marco A. Harrendorf
  • Marco Kleesiek
  • Marcus Hardt
  • Marek Szuba
  • Mario Lassnig
  • Marion Cadolle Bel
  • Mariusz Karpiarz
  • Markus Alex
  • Markus Krieger
  • Markus Lauscher
  • Markus Prim
  • Markus Stoye
  • Martin Kohn
  • Martin Sarnovsky
  • Martin Spoo
  • Martin Uffinger
  • Massimo Torquati
  • Matthias Reuter
  • Max Fischer
  • Melanie Ernst
  • Michael Bontenackels
  • Michael Bredel
  • Michael Göttsche
  • Michal Olszewski
  • Mirko Kämpf
  • Moritz Gelb
  • Muhammad Waqar
  • Nico Struckmann
  • Niklas Griessbaum
  • Nils Braun
  • Nils Faltermann
  • Oleg Dulov
  • Oleg TSIGENOV
  • Oliver Frost
  • Oliver Oberst
  • Oliver Schäfer
  • Parinaz Ameri
  • Pascal Nagel
  • Paul Millar
  • Pavel Komardin
  • Peer Hasselmeyer
  • Peter Wittenburg
  • Philippe Bertrand Kouotou Moluh
  • Pierre-Emmanuel BRINETTE
  • Pooja Saxena
  • Preslav Konstantinov
  • Quyen Nguyen
  • Raphael Friese
  • Riccardo Bucchi
  • Roland Laifer
  • Romain Rougny
  • Samuel Ambroj Perez
  • Shabnam Jabeen
  • Shyam Sharan Wagle
  • Simon Armstrong
  • Simon Knutzen
  • Simon Raffeiner
  • Stefan Groh
  • Stephan Westphal
  • Stephane GERARD
  • Sven Sternberger
  • Sören Fleischer
  • Thomas Latzko
  • Tim Roes
  • Tomasz Wolak
  • Ugur Cayoglu
  • Uros Stevanovic
  • Valentin Kozlov
  • Vladimir Kriventsev
  • William Breaden Madden
  • WooJin Park
    • 10:00 AM 12:00 PM
      Travelling Gaede Hall (Building 30.22)

      Gaede Hall

      Building 30.22

      See the travelling directions how to reach KIT Campus South in Karlsruhe City.
      On Campus South head for the building in front of the physics department's high rise building - just ask for the "Physikhochhaus" if you got lost on the campus.

    • 11:00 AM 5:00 PM
      Unicore Summit: UNICORE summit co-located with the GridKa School SR 229.3 (Building 30.22)

      SR 229.3

      Building 30.22

      • 11:00 AM
        UNICORE Summit 2015 6h
        UNICORE Summit 2015 homepage

        The UNICORE Summit is the annual meeting of the UNICORE community. It provides a unique opportunity for UNICORE users, developers, administrators, researchers, service providers, and managers to meet.

        Participate to share your experience, present recent and planned developments, learn about the latest UNICORE features, and get new ideas for interesting and prosperous collaborations.

        Please register at http://unicore.eu/summit/2015/registration.php
        Program overview
    • 12:00 PM 2:00 PM
      Registration
    • 2:00 PM 6:50 PM
      Plenary talks Gaede HS (Building 30.22)

      Gaede HS

      Building 30.22

      Convener: Dr Thomas Hartmann (SCC)
      • 2:00 PM
        Welcome to Karlsruhe Institute of Technology 30m
        Speaker: Prof. Achim Streit (KIT-SCC) (KIT-SCC)
        Slides
      • 2:30 PM
        GridKa School - Event Overview 15m
        Speaker: Dr Thomas Hartmann (SCC)
        Slides
      • 2:45 PM
        Storage Technologies 45m
        Software-Defined Storage
        Speaker: Dr Paul Millar (DESY)
        Slides
      • 3:30 PM
        Coffee Break 30m
      • 4:00 PM
        IT security in an IPv6 world 45m
        Unused IPv4 network addresses are a scarce resource. The deployment of IPv6 networking across the world is well underway. Some large IT distributed infrastructures, such as the Worldwide Large Hadron Collider Computing Grid, are starting to deploy dual-stack IPv6/IPv4 services to support IPv6-only clients. New networking protocols, such as IPv6, always bring new challenges for operational IT security. We have spent many decades understanding and fixing security problems and concerns in the IPv4 world. We have only just started with IPv6! Its lack of maturity together with all the additional complexities, particularly in a dual-stack environment, bring many challenges. This talk will consider some of the security concerns in an IPv6 world and consider best practices for system administrators who manage (or will manage) IT services on distributed infrastructures and also for their related security teams.
        Speaker: Dr David Kelsey (Rutherford Appleton Laboratory,)
        Slides
      • 4:45 PM
        From Mars to Earth through Cloud Computing 45m
        Our society has benefited from Space exploration in many ways. Many of the inventions we use nowadays have their origin in or have been improved by Space research. Computer Science is not an exception. This talk will introduce the application of Cloud Computing done by the speaker in the context of different Mars missions: Mars MetNet (Spain-Russia-Finland), MSL Curiosity (NASA) and ExoMars2016 (ESA). The achieved know-how allowed the optimization of other areas at Planet Earth, such as weather forecast and agricultural wireless sensor networks processing.
        Speaker: Dr Jose Luis Vazquez-Poletti (Universidad Complutense de Madrid (Spain))
        Slides
    • 9:00 AM 12:00 PM
      Plenary talks Gaede HS (Building 30.22)

      Gaede HS

      Building 30.22

      Convener: Mr Max Fischer (Karlsruhe Institute of Technology)
      • 9:00 AM
        Data Preservation 40m
        The long-term preservation of information is a crucial requirement for scientific progress in every research community. In former times hammer and chisel were the tools of choice to preserve the cultural heritage, nowadays the digital world introduces additional and novel challenges. Obsolete formats and technologies, a quick decay of storage media or power outages are just a few examples of threats that need to be faced in order to ensure sustainable access to valuable data.

        This talk will give an introduction into the broad field of data preservation and its basic principles. Various examples from the arts and humanities community will illustrate how heterogeneous preservation demands are and how they can be supported by a research infrastructure.
        Speaker: Ms Danah Tonne (KIT)
        Slides
      • 9:40 AM
        Software-Defined Networking for the Data Center 40m
        Software Defined Networking (SDN) is a paradigm shift in the networking domain. It has recently become a hot topic and it is expected to change the way we think about networks and how we architect them. In this talk we will look at what SDN is, how it can be realized, and what the impact on networking, mainly in the data center, is. The SDN architecture will be explained, the abstractions used by OpenFlow will be introduced, and some use cases as well as some SDN implementations will be described.
        Speaker: Dr Peer Hasselmeyer (NEC)
        Slides
      • 10:20 AM
        Coffee Break 20m
      • 10:40 AM
        Digital Transformation, Big Data … and the changing role of IT 40m
        How does the Digital Transformation change business models and which new business models arise? How do processes and business segments have to change and what is the role of IT within this development? To answer these questions we will look at the new technologies in a comprehensive way with a special focus on Big Data. Get to know SAP as the global market leader for business software and see how SAP seizes new opportunities by leveraging technologies like cloud, in-memory, and mobile computing. Selected SAP customers of different size and from various industries show how to manage the digital change and remain competitive.
        See also what possibilities SAP offers for graduates and professionals to work within those new technological fields.

        Links for further information:

        Speaker: Dr Engelbert Quack (SAP SE, Head of Consulting Area Data & Technology)
      • 11:20 AM
        Shinkansen or why trains can arrive on time 40m
        Big Data ist eines der treibenden Themen unserer Zeit. Wie sieht jedoch die Praxis aus? HDS versucht in ihrem Vortrag den Spagat zwischen Theorie und der realen Anwendung zu schlagen.
        Speaker: Jürgen Krebs (Hitachi Data Systems)
        Slides
    • 12:00 PM 1:00 PM
      Lunch 1h

      Please let us know, how you rate yesterday's school day at

      https://surveys.scc.kit.edu/limesurvey/index.php/759825/lang-en

    • 1:00 PM 6:00 PM
      Application development with relational and non-relational databases Gaede HS (Building 30.22)

      Gaede HS

      Building 30.22

      Convener: Dr Mario Lassnig (CERN)
      • 1:00 PM
        Application development with relational and non-relational databases 5h
        In this workshop, the students will learn how to use relational and non-relational databases to build multi-threaded applications. The focus of the workshop is to teach efficient, safe, and fault-tolerant principles when dealing with high-volume and high-throughput database scenarios.

        A basic understanding of the following things is required:
        - A programming language (preferably Python or any C-like)
        - Basic SQL (CREATE, DROP, SELECT, UPDATE, DELETE)
        - Linux shell scripting (bash or zsh)

        The course will cover the following three topics:

        - When to use relational databases, and when not
        * Relational primer
        * Non-relational primer
        * How to design the data model

        - Using SQL for fun and profit
        * Query plans and performance analysis
        * Transactional safety in multi-threaded environments
        * How to deal with large amounts of sparse metadata
        * Competetive locking and selection strategies

        - Building a fault-tolerant database application
        * Distributed transactions across relational and non-relational databases
        * SQL injection and forceful breakage
        * Application-level mitigation for unexpected database issues
        Speaker: Dr Mario Lassnig (CERN)
        Slides
    • 1:00 PM 6:00 PM
      CUDA GPU Programming Workshop Building 30.23

      Building 30.23

      Convener: Felice Pantaleo (CERN)
      • 1:00 PM
        CUDA GPU Programming Workshop 5h
        While the computing community is racing to build tools and libraries to ease the use of heterogeneous parallel computing systems, effective and confident use of these systems will always require knowledge about the low-level programming interfaces in these systems. <\p> This workshop is designed to introduce the CUDA programming language, through examples and hands-on exercises so as to enable the user to recognize CUDA friendly algorithms and completely exploit the computing potential of a heterogeneous parallel system.
        Speaker: Felice Pantaleo (CERN)
        Slides
    • 1:00 PM 6:00 PM
      HPC Cloud 101: HPC on Cloud Computing for newcomers: Cloud and HPC on Cloud for beginners 229.3 (Building 30.22)

      229.3

      Building 30.22

      Convener: Dr Jose Luis Vazquez-Poletti (Universidad Complutense de Madrid (Spain))
      • 1:00 PM
        HPCCloud 101: HPC on Cloud Computing for newcomers 5h
        Never been into Cloud Computing before? Do you think that an extra computing power is crucial for your research? Do you have some neat parallel codes that your institution doesn’t allow you to execute because the cluster is full? Maybe this tutorial is for you!

        The tutorial will cover the following topics:

        As Virtual Clusters deployed by StarCluster have Sun Grid Engine and OpenMPI installed you are more than welcome to bring your own codes and give them a try!
        Speaker: Dr Jose Luis Vazquez-Poletti (Universidad Complutense de Madrid (Spain))
        Slides
    • 1:00 PM 6:00 PM
      Programming Templates 6.01 (Building 30.23)

      6.01

      Building 30.23

      Convener: Dr Martin Heck (KIT)
      • 1:00 PM
        Programming Templates Tutorial 5h
        Programming Templates
        Speaker: Dr Martin Heck (KIT/EKP)
        Slides
    • 1:00 PM 6:00 PM
      Puppet 229.4 (Building 30.22)

      229.4

      Building 30.22

      Convener: Mr Ben Jones (CERN)
      • 1:00 PM
        Puppet Workshop 5h
        Puppet is a configuration management tool adopted by many institutions in academia and industry of different size. Puppet can be used to configure many different operating systems and applications. Puppet integrates well with other tools e.g. Foreman, MCollective, ...
        The workshop will feature a hands-on tutorial on Puppet allowing users to write simple manifests themselves and managing them using Git. A selection of useful tools around Puppet will be presented.

        Basic knowledge of the Linux operating system is required. The detailed agenda for the course is following:

        1st day:
        • Introduction to Git
        • Setup & technical infrastructure
        • Explanation for the setup of the infrastructure, login to the machines
        • Write manifests
        • Puppet language, resource types, modules, etc.

        2nd day:
        • Leftovers from previous day, and/or some more advanced configuration
        • Series of small presentations and walk-throughs: Hiera, Facter, Foreman, MCollective, GitLab, ...

        Prerequisites:
        • Attendants should familiarize themselves with a Linux terminal and the peculiarities of a Linux text editor (vi, emacs etc.).
        • No knowledge of Puppet or Git is required.

        Speakers: Mr Ben Jones (CERN (CH)), Mr Sven Sternberger (DESY), Mr Yves Kemp (DESY)
        Slides
    • 6:00 PM 10:00 PM
      School Social Event Atrium (Building 30.22)

      Atrium

      Building 30.22

      • 6:30 PM
        Tarte Flambee 3h 30m Atrium

        Atrium

        Building 30.22

        Social evening with Tarte Flambee, beer and drinks in the courtyard of building 30.22
      • 6:30 PM
        visit of the SCC CS computing centre 1h Building 20.21

        Building 20.21

        For interested participants, we are organizing a short excursion to the SCC computing center at Campus South
    • 9:00 AM 12:00 PM
      Plenary talks Gaede HS (Building 30.22)

      Gaede HS

      Building 30.22

      Convener: Mr Peter Kraus (KIT/SCC)
      • 9:00 AM
        Climate simulations 40m Gaede HS

        Gaede HS

        Building 30.22

        Climate change as a result of increased emissions of greenhouse gases is a global scale challenge for today's society and future generations. Climate model simulations are important tools to test our scientific knowledge of the processes involved, and to provide projections of future changes and their impacts. After a general introduction this talk will focus on recent advances in atmospheric chemistry-climate modelling, including a discussion of the technical challenges of climate model simulations.
        Speaker: Dr Björn-Martin Sinnhuber (KIT/IMK-ASF)
        Slides
      • 9:40 AM
        Linux containers and Docker 40m Gaede Hall

        Gaede Hall

        Building 30.22

        Linux containers (LXC) is a technology that provides operating system-level virtualisation not via a virtual machines but rather by using a single kernel to run multiple instances on the same OS. Linux namespaces and control groups (cgroups) represent the foundation on which LXC are built. Containers are fast to deploy, they introduce no overhead or indirection as in the case of traditional virtual machines and also have the added design benefit of ensuring complete isolation between processes. Containers are great for running multiple instances of the same service in parallel either as part of a scaling out strategy or just for testing purposes. Docker is built around Linux containers and offers an intuitive way of managing them by abstracting and automating some of the configuration details. Besides being an open-source project, Docker has enabled the development of an entire "ecosystem" of tools and products targeting container technology.
        Speaker: Elvin Sindrilaru (CERN)
        Slides
      • 10:20 AM
        Coffee Break 20m Gaede HS (Building 30.22)

        Gaede HS

        Building 30.22

      • 10:40 AM
        ElasticSearch and the ELK stack for monitoring and data analysis 40m Gaede HS

        Gaede HS

        Building 30.22

        Elasticsearch, Logstash and Kibana, known as the ELK stack, are three open source projects designed to ship, parse, search, analyze and visualize data, from Apache logs to Twitter streams. The Web-based Information Systems (WebIS) group of the Institute for Applied Computer Science (IAI) of the Karlsruhe Institute of Technology (KIT) uses the ELK stack in different large scale web information system projects as central components for data aggregation, analysis and search driven data access. Therefore, besides giving a rough overview on ELK features, this talk will explore possibilities and scenarios of using the ELK stack in web applications like community web portals, environmental information systems, smart energy management systems, or for log data analysis. Common data storage and analysis capabilities of ElasticSearch will be explained and examples given, how the ELK stack could be (programmatically) integrated into own software solutions.
        Speaker: Dr Clemens Düpmeier (KIT/IAI)
        Slides
      • 11:20 AM
        School Photo 40m Gaede HS (Building 30.22)

        Gaede HS

        Building 30.22

        if the weather permits, we will take a picture with all participants in from of Karlsruhe castle

    • 12:00 PM 1:00 PM
      Lunch 1h

      Please let us know, how you rate yesterday's sessions at

      https://surveys.scc.kit.edu/limesurvey/index.php/478776/lang-en

    • 1:00 PM 6:00 PM
      Apache Spark in Scientific Applications 3.01 (Building 30.23)

      3.01

      Building 30.23

      Convener: Mirko Kämpf (Cloudera)
      • 1:00 PM
        Apache Spark in Scientific Applications [A] 5h
        This tutorial is limited to 12 participants. Another session of this tutorial is also available

        The workshop Spark in Scientific Applications covers fundamentale development and data analysis techniques using Apache Hadoop and Apache Spark. Beside an introduction into the theoretical background about Map-Reduce- and Bulk-Synchronous-Parallel processing, also the machine learning library MLlib and the graph processing framework GraphX are used.

        We work on sample data sets from Wikipedia, financial market data, and from a generic data generator. During the tutorial sessions we illustrate the Data Science Workflow and present the right tools for the right task.

        All practical exercises are well prepared in a pre-configured virtual machine. Participants get access to required data sets on a „one node pseudo-distributed“ cluster with all tools inside. This VM is also a starting point for further experiments after the workshop.
        Speaker: Mr Mirko Kämpf (Cloudera)
        Slides
    • 1:00 PM 6:00 PM
      CEPH 6.01 (Building 30.23)

      6.01

      Building 30.23

      Convener: Ms Diana Gudu (KIT)
      • 1:00 PM
        CEPH Tutorial 5h
        Ceph is an open-source, software-defined distributed storage system that strives to achieve scalability and reliability through an innovative decentralised design.

        Distributed file systems nowadays face multiple challenges: scaling to peta-byte capacity and providing high performance, while protecting against failures. Moreover, file systems should be able to adapt to dynamic distributed workloads to provide the best performance. Ceph tries to tackle these challenges with a completely decentralised architecture that has no single point of failure. Reliability is achieved through distributed data placement and replication. Ceph's dynamic metadata partitioning feature helps deal with dynamic workloads.

        This session is an introduction to Ceph and it will cover theoretical background on Ceph's architecture, as well as hands-on exercises, such as installation and configuration of a Ceph cluster, simple usage and monitoring. After completing this session, you should be able to understand and discuss Ceph concepts, and deploy and manage a Ceph Storage Cluster.

        Basic knowledge of Linux and storage concepts is required.
        Speaker: Ms Diana Gudu (KIT\SCC)
        Slides
    • 1:00 PM 5:00 PM
      Concurrent Programming in C++ Gaede HS (Building 30.22)

      Gaede HS

      Building 30.22

      Convener: Dr Graeme Stewart (CERN)
      • 1:00 PM
        Concurrent Programming in C++ 4h
        In this course we will introduce how to program for concurrency in C++, taking advantage of modern CPUs ability to run multi-threaded programs on different CPU cores. Firstly, we will explore the new concurrency features of C++11 itself, which will also serve as a general introduction to multi-threaded programming. Students will learn the basics of asynchronous execution, thread spawning, management and synchronisation. Some elementary considerations about deadlocks and data races will be introduced, which will illustrate the common problems that can arise when programming with multiple threads. After this the Threaded Building Block template library will be introduced. We shall see how the features of this library allow programers to exploit multi-threading at a higher level, not needing to worry about so many of the details of thread management. \/p> Students should be familiar with C++ and the standard template library. Some familiarity with makefiles would be useful.
        Speaker: Mr Graeme Stewart (CERN (CH))
        Slides
    • 1:00 PM 6:00 PM
      Docker for a ROOT based data analysis flow 2.01 (Building 30.23)

      2.01

      Building 30.23

      Convener: Mr Elvin Sindrilaru (CERN (CH))
      • 1:00 PM
        Docker for a ROOT based data analysis flow 5h
        Linux containers (LXC) is a technology that provides operating system-level virtualisation not via a virtual machines but rather by using a single kernel to run multiple instances on the same OS. Linux namespaces and control groups (cgroups) represent the foundation on which LXC are built. Containers are fast to deploy, they introduce no overhead or indirection as in the case of traditional virtual machines and also have the added design benefit of ensuring complete isolation between processes. Containers are great for running multiple instances of the same service in parallel either as part of a scaling out strategy or just for testing purposes. Docker is built around Linux containers and offers an intuitive way of managing them by abstracting and automating some of the configuration details. Besides being an open-source project, Docker has enabled the development of an entire "ecosystem" of tools and products targeting container technology.
        Speaker: Mr Elvin Sindrilaru (CERN (CH))
        Slides
    • 1:00 PM 6:00 PM
      Elastic Search, Logstash, Kibana: ELK stack 2.1 (Building 30.23)

      2.1

      Building 30.23

      Conveners: Dr Samuel Ambroj (KIT), Mr kajorn Pathomkeerati (KIT/IAI)
      • 1:00 PM
        Elastic Search, Logstash and Kibana 5h
        Elasticsearch, Logstash and Kibana, known as the ELK stack, are three open source projects designed to ship, parse, search, analyse and visualize your data, from Apache logs to Twitter streams. A short description of the components is the following:
        • Logstash allows you to ship and parse your data using a great variety of plugins. It is highly scalable.
        • Elasticsearch is a search server based on Apache Lucene. It is distributed and highly scalable.
        • Kibana is the visualization platform available through a web browser with a nice interface and easy to customize directly from the browser.

        In this course we will explain to you these three components and we will guide you through their installation and configuration. Several different data logs will be analyzed in order to finally create your own Kibana dashboards.

        Basic Linux knowledge and be familiar with vim is required. Some regular expressions knowledge would be a plus.

        Speakers: Mr Kajorn Pathomkeerati (KIT/IAI), Mr Samuel Ambroj Perez (KIT/SCC)
        Slides
    • 1:00 PM 6:00 PM
      Puppet 229.4 (Building 30.23)

      229.4

      Building 30.23

      Convener: Mr Ben Jones (CERN)
    • 1:00 PM 6:00 PM
      R for Large Scale Data Analysis: R 229.3 (Building 30.22)

      229.3

      Building 30.22

      Convener: Ms Eileen Kuehn (KIT)
      • 1:00 PM
        R for Large Scale Data Analysis 5h
        The R programming language is a free software environment for statistical computing and graphics. It is widely used among statisticians and data miners.
        But especially the huge variety of available packages will make the introduction into daily business far from easy. This tutorial focuses on using R for large amounts of data. It deals with three different topics: managing data, analysing data, as well as basic and intermediate plotting. Each topic is accompanied by a short introduction, overview of experiences, as well as recommended packages. The tutorial itself is hands-on. We will look at different possibilities and solutions.
        The tutorial targets participants who already have some basic experiences with programming but do not necessarily know much about R.
        Speaker: Ms Eileen Kühn (KIT\SCC)
        Slides
    • 9:00 AM 12:00 PM
      Plenary talks Gaede HS (30.22)

      Gaede HS

      30.22

      Convener: Parinaz Ameri (KIT)
      • 9:00 AM
        Apache Spark: The next Generation of Hadoop Processing 40m Gaede HS

        Gaede HS

        30.22

        Apache Spark is known as the "Next Generation Framework" of Hadoop based data processing. Why, and what Apache Spark offers to the scientific community is explained in this talk. The convergence of different analysis techniques into one flexible and highly efficient processing engine allows completely new interdisciplinary analysis methods beside cheap analysis prototypes. In this presentation I shown examples in Scala and Python. Beside fundamental techniques and the core features of Apache Spark we look into development practices and data analysis techniques. Therefore we recap the theoretical background about Map-Reduce- and Bulk-Synchronous-Parallel processing before I introduce the machine learning library MLlib and the graph processing framework GraphX. Apache Spark uses the concept of data frames, and allows SQL operations on data sets, after this presentation you know how this works and how you can save a lot of time. Finally, you can see how data can be collected and analyzed on the fly, using Spark Streaming.
        Speaker: Mr Mirko Kaempf (Cloudera)
      • 9:40 AM
        Technical Computing on OpenPower 40m Gaede HS

        Gaede HS

        30.22

        Since 2013 the OpenPower Foundation grew steadily to over 130 Members so far. The goal of the OpenPower Foundation is to enable a joint development and integration of different technologies around the IBM Power CPU architecture to speed up innovation. Within the foundation, Technical computing (HPC, HTC) is a focus topic for several members like NVIDIA, Mellanox, IBM and others to enable accelerated computing based on OpenPower e.g. with GPGPUs or FPGAs. This talk will outline the technical computing future with OpenPower in DataCentric environments.
        Speaker: Dr Oliver Oberst (IBM)
        Slides
      • 10:20 AM
        Coffee Break 20m Gaede Hall (Building 30.22)

        Gaede Hall

        Building 30.22

      • 10:40 AM
        Big data in critical infrastructure: Production and failover infrastructure in DWD's central data management 40m Gaede HS

        Gaede HS

        30.22

        The German Weather Service (DWD) provides a wide variety of services for the protection of life and property in the form of weather and climate information. One core task is safeguarding aviation, marine safety and terrestrial traffic. Another is warning before meteorological events that could endanger public safety and order. Additionally, we monitor the climate and are active in multiple research fields, from ensemble numerical weather forecasting to applications of weather data in new areas. Data is recorded, processed and transformed into time-critical products and securely archived 24 hours per day, 365 days per year.

        The DWD maintains a high productivity, redundant infrastructure in order to provide these services reliably and on demand. We ensure deliverability with multiple tiers of failover strategies, enabling us to manage and monitor production even when faced with major hardware or software failures.

        Specialized systems allow rapid access to large, cross-sectional binary files in file system caches for near-real-time applications, while an automated tape archive provides short-term access to long-term archival data. Simultaneously, observational data is processed and stored in relational databases in order to allow comfortable processing of long time series data. Various application layers are used to post-process products in order to refine them for domain-specific queries.

        Demands for weather and climate based data and services, as well as the associated needs for processing power, network transfer capabilities and storage capacity are constantly increasing. It is the DWD's goal not only to maintain a production infrastructure with high quality and availability, but also to continue to evolve to meet these demands. Doing so while maintaining our tradition of quality, speed and reliability is one of the major challenges facing the DWD. Some current projects designed to meet these goals are introduced in the outlook.
        Speaker: Mr Daniel Lee (DWD)
        Slides
      • 11:20 AM
        Research Data Alliance - Research Data Sharing without barriers 40m Gaede hall (Building 30.22)

        Gaede hall

        Building 30.22

        Even examples from Psycholinguistics – a humanities discipline – show that data intensive science is changing all scientific disciplines dramatically posing unprecedented challenges in data management and processing. A recent survey in Europe showed clearly that most of the research departments are not prepared for this step and that the methods that are used to manage, curate and process data are inefficient and too costly. Despite a wide agreement on some obvious trends with respect to data and on principles about data sharing such as those formulated by the G8 ministers, we lack clear guidelines and strategies of how to move ahead. Therefore, Research Data Alliance as a bottom-up organized global and cross-disciplinary initiative has been established to accelerate the process of changing data practice. After only two years RDA produced its first concrete results, which have to demonstrate their practicality. In particular the infrastructure builders are requested to act as early adopters. RDA as an initiative to specify interfaces, protocols, guidelines, etc. needs to be seen as a chance for us to discuss how we can move ahead. Infrastructure builders need to put results in place to test the results. All three - researchers, infrastructure builders and RDA experts - need to remain in a close discussion process to achieve the fast progress we are waiting for. The talk will address all aspects which have been mentioned.
        Speaker: Mr Peter Wittenburg (RDA)
        Slides
    • 12:00 PM 1:00 PM
      Lunch 1h

      Please let us know, how you rate yesterday's sessions at

      https://surveys.scc.kit.edu/limesurvey/index.php/215574/lang-en

    • 1:00 PM 6:00 PM
      AngularJS 2.01 (Building 30.23)

      2.01

      Building 30.23

      Convener: Tim Roes (Inovex)
      • 1:00 PM
        AngularJS workshop 5h
        This workshop will focus on creating modern web applications and store their data into the cloud. AngularJS is a framework that has been growing popular during the last years, due to its flexibility, its power to build rich web applications and yet its ease to use.

        During this workshop we will build a web application from scratch and we'll connect it to the cloud to easily sync your (users) data over the web.

        Participants of this workshop are required to bring their own notebook. If possible install Node.js from https://nodejs.org in preparation. To work you will also require a text editor of your choice (we recommend SublimeText from http://www.sublimetext.com).

        Since the workshop targets AngularJS beginners, you don't need any experience in AngularJS (or even have heard about it).
        But to follow the workshop you will need at least beginners knowledge in JavaScript (or optional TypeScript).
        Speakers: Matthias Reuter (Inovex), Mr Tim Roes (Inovex)
        Slides
    • 1:00 PM 6:00 PM
      Apache Spark in Scientific Applications 3.01 (Building 30.23)

      3.01

      Building 30.23

      Convener: Mirko Kaempf (Cloudera)
      • 1:00 PM
        Apache Spark in Scientific Applications [B] 5h
        This tutorial is limited to 12 participants. Another session of this tutorial is also available

        The workshop Spark in Scientific Applications covers fundamentale development and data analysis techniques using Apache Hadoop and Apache Spark. Beside an introduction into the theoretical background about Map-Reduce- and Bulk-Synchronous-Parallel processing, also the machine learning library MLlib and the graph processing framework GraphX are used.

        We work on sample data sets from Wikipedia, financial market data, and from a generic data generator. During the tutorial sessions we illustrate the Data Science Workflow and present the right tools for the right task.

        All practical exercises are well prepared in a pre-configured virtual machine. Participants get access to required data sets on a „one node pseudo-distributed“ cluster with all tools inside. This VM is also a starting point for further experiments after the workshop.
        Speaker: Mr Kaempf Mirko (Cloudera)
        Slides
    • 1:00 PM 6:00 PM
      FastFlow: Parallel Programming using parallel patterns and the FastFlow frameworks 229.3 (Building 30.22)

      229.3

      Building 30.22

      Convener: Dr Massimo Torquati (Universita di Pisa)
      • 1:00 PM
        FastFlow: Parallel Programming using parallel patterns and the FastFlow frameworks 5h
        FastFlow is an open-source C++ research framework to support the development of multi-threaded applications in modern multi/many-core heterogeneous platforms. The framework provides well-known stream-based algorithm skeleton constructs such as pipeline, task- farm and loop that are used to build more complex and powerful pattern: parallel_for, map, reduce, macro-data-flow interpreter, genetic-computation, etc.

        During this tutorial session, the participants will learn how to build application structured as a combination of stream-based parallel pattern like pipeline, task-farm loops and their combinations. Then more high-level patterns will be introduced such as parallel_for, map and reduce, stencil-reduce and we will see how to mix stream and data-parallel patterns to build parallel applications and algorithms. Different possible implementations will be discussed and tested. Participants will have the opportunity to implement multi-threading algorithms and simple benchmarks to evaluate performance (considering also energy consumption).

        Desirable prerequisite:
        • Good knowledge of C programming
        • Knowledge of multi-threading programming and concurrency problems.
        • Knowledge of C++ templates. Features of C++11 standard will be also used.
        • Basic Knowledge of OpenCL.

        Expected participants: 10/15
        Speaker: Mr Massimo Torquati (Universita di Pisa)
        Slides
    • 1:00 PM 6:00 PM
      MongoDB 6.01 (Building 30.23)

      6.01

      Building 30.23

      Conveners: Dr Marek Szuba (Karlsruhe Institute of Technology), Ms Parinaz Ameri (KIT)
      • 1:00 PM
        Mongo DB Tutorial 5h
        This session is an introduction to a particular NoSQL database, MongoDB.

        MongoDB is an open-source database with document-oriented storage approach. Since it doesn’t enforce any schema on data and because of its good performance, Mongo is nowadays widely used especially where unstructured data storage is needed. In addition, Mongo scales well and even provides partitioning over cluster of nodes. So, it is ideal for Big Data use cases.

        This session will provide theoretical basic knowledge about Mongo and support it with hands-on activities to get to know Mongo in practice.

        The agenda will cover the followings:

        • Getting familiar with Mongo terminologies
        • Executing CRUD operations
        • Indexing
        • Schema design
        • Use of Mongo to make a small web application
        • Authentication and authorization possibilities
        • Getting to know replication and Sharding mechanisms
        • (optional) Analyzing data stored in Mongo using R

        Basic Linux knowledge and some background knowledge about relational databases will be helpful in this session, but is not mandatory.

        Speakers: Mr Marek Szuba (KIT\SCC), Ms Parinaz Ameri (KIT/SCC)
    • 1:00 PM 6:00 PM
      Python: Pandas Gaede HS (Building 30.22)

      Gaede HS

      Building 30.22

      Convener: Dr Manuel Giffels (KIT)
      • 1:00 PM
        Scientific Python 5h
        Python is a high-level dynamic object-oriented programming language. It is easy to learn, intuitive, well documented, very readable and extremely powerful. Python is packaged with an impressive standard library following the so called "batteries included" philosophy. Together with the large number of additional available scientific packages like NumPy, SciPy, pandas, matplotlib, scikit-learn, etc., Python becomes a very well suited programming language for data analysis. This hands-on session aims towards advanced Python beginners, who have already gained some knowledge about Python (Scripting experience and knowing the term list comprehension should be sufficient). This course gives an introduction and demonstrates the power of Python in data analysis using NumPy and pandas.
        Speaker: Dr Manuel Giffels (KIT\EKP)
        Slides
    • 1:00 PM 6:00 PM
      Software Defined Data Center 229.4 (Building 30.22)

      229.4

      Building 30.22

      Convener: Johannes Scheuermann (Innovex/KIT)
      • 1:00 PM
        Software Defined Data Center 5h
        Running traditional data centers, engineers have to face many challenges, such as running multiple different workloads. In this workshop we will have a look at such a data center and identify the challenges that have to be faced when using these architectures. After this we look at a basic use case and implement it for a traditional data center. In the next step we will have a look at new technologies for Software Defined Data Centers (SDDC). This enables us to compare how these new technologies cope with the problems found earlier and help make data centers more flexible.

        A big benefit of SDDCs is running dynamic and flexible workloads while archiving high resource utilization.

        A SDDC contains these (and more) technologies:
        Software defined Networking (SDN)
        Software defined Storage (SDS)
        Data Center Operating System (DCOS)

        The goal of this workshop is to build your own mini SDDC with a reference software stack based on:
        CentOS / CoreOS
        Docker
        Mesos and Mesosphere
        (Quobyte)
        (OpenVswitch)

        Requirement for participation:
        Basic knowledge of data centers An additional tutorial for deepening the topic is available on Friday http://indico.scc.kit.edu/indico/event/89/session/35/contribution/53

        Speaker: Mr Johannes Scheuermann (Inovex\KIT)
        Slides
    • 1:00 PM 6:00 PM
      Software Defined Networks 2.1 (Buildimng 30.23)

      2.1

      Buildimng 30.23

      Convener: Dr Michael Bredel
      • 1:00 PM
        SDN: Software-Defined Networks 5h
        Today’s communication networks are designed around the original mechanisms of Ethernet and TCP/IP. Because of the success of these early technologies, networks grew bigger and more complex, which led to a need for more complex control options, such as VLANs and ACLs. A variety of heterogeneous network appliances such as firewalls, load balancers, IDS, optimizers, and so on, each implement their own proprietary control stack. Reciprocal communication is handled by other complex protocols such as Spanning Tree, Shortest Path Bridging, Border Gateway, or similar. Each additional component thus increases the complexity and complicates integrated network management. The consequences are often low network utilization, poor manageability, lack of control options in cross-network configurations, and vendor lock-in.

        One way out of this dilemma is Software Defined Networks (SDNs) and OpenFlow. OpenFlow is an Open Networking Foundation (ONF) standard protocol that abstracts the complex details of a fast and efficient switching architecture. Today, OpenFlow offers an open control interface that is now implemented in hardware by all major network component manufacturers. Several vendors even offer software switches that support virtualized datacenters. OpenFlow also supports the concept of separating the data and control paths, which lets a central control point oversee a variety of OpenFlow-enabled network components. The SDN controller could even be a distributed application to provide additional security, fault-tolerance, or load balancing.

        This presentation focuses on a general introduction to Software Defined Networking and OpenFlow. We shed light on various aspects of today's network management and its challenges and elaborate on possible solutions offered by SDN. Moreover, the hands-on tutorial addresses the OpenDaylight SDN controller. To this end, we install, configure, and run OpenDaylight. We emulate a small network using the MiniNet Network Emulator and have OpenDaylight manage the data flows in that network. We will experience the beauty of such a centralized solution and discuss further areas of application, such as cloud computing and OpenStack, for instance.

        Speaker: Mr Bredel Michael (FH Kufstein)
        Slides
    • 8:00 PM 11:00 PM
      School dinner Dinner Room (Leonardo Hotel)

      Dinner Room

      Leonardo Hotel

    • 9:00 AM 11:30 AM
      Plenary talks Gaede HS (Building 30.22)

      Gaede HS

      Building 30.22

      Convener: Mr Ugur Cayoglu (SDM)
      • 9:00 AM
        EUDAT - The European Data Infrastructure 45m
        This talk will provide an overview of the EUDAT initiative which has laid out the foundation of a new Collaborative Data Infrastructure (CDI) providing solutions for finding, sharing, preserving and performing computations with primary and secondary research data on a pan-European level. By addressing the accelerated proliferation of data and the resulting challenges faced by the research communities through a cross-disciplinary approach, and by identifying and proposing solutions to barriers to the development of an efficient pan-European e-infrastructure ecosystem, research e-infrastructures like EUDAT make concrete contributions to eliminating barriers to cross national and cross disciplinary collaboration and reinforcing the level playing field for European researchers and data managers.
        Speaker: Dr Damien Lecarpentier (EUDAT)
        Slides
      • 9:45 AM
        Coffee Break 30m
      • 10:15 AM
        The Icosahedral Nonhydrostatic (ICON) model: Scalability on Massively Parallel Computer Architectures 45m
        Simulation in numerical weather prediction and climate forecasting has a fast-growing demand for memory capacity and processing speed. For the last decade, however, computer technology has shifted towards multi-core chip designs while at the same time on-chip clock rates have increased only moderately. The parallel implementation of DWD's operational forecast model ICON therefore follows a hybrid distributed/shared memory approach, based on the Message Passing Interface (MPI) and the OpenMP API. The ICON code couples the different components of the earth system model, e.g. dynamics, soil and radiation, with high-level language constructs. Its communication characteristics and programming patterns take the unstructured triangular grid into account and are designed to meet the main challenges in high performance computing, i.e. load balancing, cache efficiency, and low-latency networking. The implementation employs special domain decomposition heuristics, parallel range-searching algorithms, and makes use of asynchronous I/O servers to deal with the potentially prohibitive amount of data generated by earth system models. This facilitates the ICON code to extract an adequate level of performance on a wide range of HPC platforms, targeting large scalar cluster systems with thousands of cores as well as vector computers.
        Speaker: Florian Prill (DWD)
        Slides
      • 11:00 AM
        Conclusions 15m
        Speaker: Dr Thomas Hartmann (SCC)
        Slides
    • 11:30 AM 1:00 PM
      Lunch 1h 30m

      Please let us know, how you rate yesterday's sessions at

      https://surveys.scc.kit.edu/limesurvey/index.php/185572/lang-en

      please rate also afterwards today's school sessions
      https://surveys.scc.kit.edu/limesurvey/index.php/438643/lang-en

      and your overall impression of the school

      https://surveys.scc.kit.edu/limesurvey/index.php/339598/lang-en

    • 1:00 PM 6:00 PM
      IBM BootCamp 229.4 (Building 30.22)

      229.4

      Building 30.22

      Convener: Mr Hendrik Leddin (IBM)
      • 1:00 PM
        IBM SPSS Data Mining Workshop 5h
        This hands-on IBM SPSS Data Mining Workshop is an instructor-led session using IBM’s data mining and predictive modeling software and is designed for those who are familiar with predictive analytics. Through this workshop you will experience first hand how IBM SPSS Modeler works and how easy it is to implement predictive analytics. Introduction in Predictive Analytics

        Exercise: IBM SPSS Modeler

        • Predictive in 20 min.
        • Association Modelling
        • Segmentation Modelling
        • Classification Modeling
        • Deployment
        Speaker: Henrik Leddin (IBM)
        Slides
    • 1:00 PM 6:00 PM
      Software Defined Data Center: Deeper Dive into SDDC Gaede HS (Building 30.22)

      Gaede HS

      Building 30.22

      Convener: Johannes Scheuermann (Inovex)
      • 1:00 PM
        Software Defined Data Center in detail - Addon 5h
        This is an additional session for deepening the tutorial on SDDC http://indico.scc.kit.edu/indico/event/89/session/35/contribution/45 basic knowledge of SDDCs as presented in the general tutorial are required
        Speaker: Mr Johannes Scheuermann (Inovex)
        Slides