The International GridKa School "Big Data, Clouds and Grids" is one of the leading summer schools for advanced computing techniques in Europe. The school provides a forum for scientists and technology leaders, experts and novices to facilitate knowledge sharing and information exchange.The target audience are different groups like grid and cloud newbies, advanced users as well as administrators, graduate and PhD students. GridKa School is hosted by Steinbuch Centre for Computing (SCC) of Karlsruhe Institute of Technology (KIT). It is organized by KIT and the HGF Alliance "Physics at the Terascale". This year we will continue to highlight the program tracks, which were initially introduced last year and were accepted with great interest. In addition we will pay more attention to Big Data Management and related technologies. Main topics of the school this year are:
Half of the school consists of the expert talks, which cover the fundamental and theoretical aspects of the topics. While the other half consists of the hands-on sessions and workshops, which give the participants the excellent chance to gain the practical experience on techniques and tools. |
Thanks to our sponsors
|
plenary talks evaluation: http://surveys.gridka.de/limesurvey/index.php?sid=22841&lang=en
evaluation page: http://surveys.gridka.de/limesurvey/index.php?sid=43143&lang=en
In the last couple of years cloud computing has achieved an important status in the IT scene. The renting of computing power, storage and applications according to requirements is regarded as future business.
This tutorial course gives an introduction of the basic concepts of the Infrastructure-as-a-Service (IaaS) model based on the cloud offerings provided by Amazon, one of the present leading commercial cloud computing providers.
The language C++ supports multiple programming paradigms and is often the first choice for applications where performance matters. It is widely being used by scientific communities including high energy physics. The course covers basic software design patterns, simple best practice rules, examples from the Standard Template Library, and selected topics from object oriented and generic programming. The goal is to help scientists to efficiently use C++ in order to improve the quality and to ease the maintenance of their software. Participants are required to have basic knowledge of C++ and the concepts of object oriented programming.
Session 1 : Setup a CDH cluster and collect data with FLUME and SQOOP(2)*
Session 2 : Query large data sets with Hive / Pig / Map Reduce
Session 3 : Benchmark your cluster
Session 4 : Hadoop development with Eclipse / Maven / Hadoop Developer Tools (HDT) / Cloudera Developer Kit (CDT)
This session will be an introduction to relational and non-relational database management systems, with a hands-on approach.
1) Theory Session
Introduction to relational databases including terminology, relations, constraints, and operations.
2) Practice Session
Development of a simple application with a relational database backend using Python and SQLite.
3) Discussion
Typical pitfalls when building applications with a database backend.
4) Theory Session
Introduction to non-relational databases including characteristics, scalability, consistency, mapreduce, and operations.
5) Practice Session
Development of a simple application with a non-relational database backend using Python and MongoDB.
6) Discussion
Comparison of the developed applications with both types of backends.
Sessions
Tue, Aug. 27, 13:30 - 18:30
Wed, Aug 28, 10:50-18:30
Thu , Aug 29, 10:50-18:30
Abstract
In this workshop the participants will take on the role as security teams being responsible for the operational security of simulated grid sites running in a virtualized environment.
The sites will face attacks very similar to those seen in real life. The teams' task is to respond to these attacks and keep their services up and running as far as possible.
A running score will be kept, and at the end of the workshop the winners will receive fabulous prizes.
Maximum number of participants is 18.
Be prepared
A security incident always puts you in a challenging situation. You have to do many things correctly, quickly and in the correct order. What to do and when during incident response is usually formulated in an incident response procedure. We will start from the general Grid Incident Response Procedure available from EGI-CSIRT and discuss how to adapt it to local regulations.
Have a view of your site
Usually the information you initially get in a security incident will be relatively sparse and the amount of logs quite large. Therefore it is crucial to quickly get an initial overview of the problem, i.e. which systems are affected and which systems are at risk. Here we will discuss and use tools you can easily set up at your site as well, like a central syslog facility, a grid systems log analyzer and EGI CSIRT's vulnerability scanner Pakiti.
The heat is on
Now it is time to put your newly acquired knowledge to the test. As administrator of a simulated cluster, you will have to defend yourself against a determined attacker.
Hands-on forensics
Investigating a compromised system is a delicate situation. It is easy to lose crucial information if you are not careful enough. We will discuss several levels of volatile information and do and dont's in how to collect it. The creation and analysis of memory and disk images will be discussed.
Wrap-up, lessons learned
At the end of the work shop we will discuss the findings. The crucial point here is to find how the site was attacked and which steps could be taken to prevent this from happening again. This should result in some best practices on how to reduce the attack surface of your site.
This gLite middleware administration workshop gives students a chance to
perform installation and configuration of some of the EMI compute components. The
goal of the workshop is to install a minimal Grid site using the EMI CREAM
Computing Element (CE) and a Worker Node (WN) using the PBS-Torque batch
system. Students will be shown how to install and configure these
services using YAIM, and how to troubleshoot problems that may occur.
After the manual installation, an example on a puppet automated
installation will be provided.
evaluation page: http://surveys.gridka.de/limesurvey/index.php?sid=56978&lang=en
It’s well known that the developments environments used in Grid, Volunteer computing (VC) and Cloud are very different. The key differences between these three platforms are based on theoretical concepts as well as implementa¬tions.
The aim of this tutorial is to propose a set of concepts and tools used to bridge these three large-scale distributed systems: Grid, Cloud and Volunteer Computing.
Concretely speaking, we propose a common library used to develop high performance applications that could be deployed on Grid, VC and Cloud without any re-writing. The following platforms/middlewares will be used during the practical part:
1.the Advanced Resource Connector (ARC) middleware
2.the XtremWeb-CH volunteer computing platform (XWCH: www.xtremwebch.net)
3.the cloud platforms: Amazon, Azure and Venus-C
The tutorial is composed of theoretical and practical parts. The theoretical part will deal with the following aspects:
- Grid and Volunteer computing vs. Cloud computing
- Overview of ARC, XWCH, Amazon, Venus-C and Azure platforms
- How to develop applications for ARC, XWCH, Amazon, Venus-C and Azure platforms
- A common high-level API for large scale distributed systems
During the practical part, the students will be able to:
- Write his/her own application
- Deploy his/her application by using one or several of these bridges: ARC/XWCH, XWCH/Amazon and XWCH/Azure.
The language C++ supports multiple programming paradigms and is often
the first choice for applications where performance matters. It is
widely being used by scientific communities including high energy
physics. The course covers basic software design patterns, simple best
practice rules, examples from the Standard Template Library, and
selected topics from object oriented and generic programming. The goal
is to help scientists to efficiently use C++ in order to improve the
quality and to ease the maintenance of their software. Participants are
required to have basic knowledge of C++ and the concepts of object
oriented programming.
During this session, the participants will learn the basic concepts of multi-threaded programming. In particular, they will apply this paradigms to well known and widely used data-processing algorithms. Available software solutions will be introduced and specific functionalities they offer will be discussed. The second, hands-on part of this session will give the participants the opportunity to implement multi-threaded algorithms and benchmark their profitability.
Desirable Prerequisite:
Basic knowledge of C++
C++ templates will be used
Sessions
Tue, Aug. 27, 13:30- 18:30
Wed, Aug 28, 10:50-18:30
Thu , Aug 29, 10:50-18:30
Abstract
In this workshop the participants will take on the role as security teams being responsible for the operational security of simulated grid sites running in a virtualized environment.
The sites will face attacks very similar to those seen in real life. The teams' task is to respond to these attacks and keep their services up and running as far as possible.
A running score will be kept, and at the end of the workshop the winners will receive fabulous prizes.
Maximum number of participants is 18.
Be prepared
A security incident always puts you in a challenging situation. You have to do many things correctly, quickly and in the correct order. What to do and when during incident response is usually formulated in an incident response procedure. We will start from the general Grid Incident Response Procedure available from EGI-CSIRT and discuss how to adapt it to local regulations.
Have a view of your site
Usually the information you initially get in a security incident will be relatively sparse and the amount of logs quite large. Therefore it is crucial to quickly get an initial overview of the problem, i.e. which systems are affected and which systems are at risk. Here we will discuss and use tools you can easily set up at your site as well, like a central syslog facility, a grid systems log analyzer and EGI CSIRT's vulnerability scanner Pakiti.
The heat is on
Now it is time to put your newly acquired knowledge to the test. As administrator of a simulated cluster, you will have to defend yourself against a determined attacker.
Hands-on forensics
Investigating a compromised system is a delicate situation. It is easy to lose crucial information if you are not careful enough. We will discuss several levels of volatile information and do and dont's in how to collect it. The creation and analysis of memory and disk images will be discussed.
Wrap-up, lessons learned
At the end of the work shop we will discuss the findings. The crucial point here is to find how the site was attacked and which steps could be taken to prevent this from happening again. This should result in some best practices on how to reduce the attack surface of your site.
OpenStack is currently one of the most evolving open IaaS solutions available. Every new release comes with a huge set of new features. It can be hard to hold pace with such changes. Starting from scratch also proves difficult due to the complexity of the several components interacting with each other.
"The proposed training targets system administrators with little or no knowledge on cloud infrastructure, interested in learning how deploy and operate Openstack. The training is organised in two full days. Main topics of the training will be:
If time is left, a short introduction on OpenStack compatible storage systems may be added (Swift/Ceph).
While the computing community is racing to build tools and libraries to ease the use of these heterogeneous parallel computing systems, effective and confident use of these systems will always require knowledge about the low-level programming interfaces in these systems.
This lecture is designed to introduce through examples and hands-on exercises, based on the CUDA programming language, the three abstractions that make the foundations of GPU programming:
Python for Scientific programming
Python is a high-level, dynamic, general-purpose programming language. It is remarkable for the clarity and expressive power it offers in exchange for a relatively low learning investment.
Python is designed to be extensible with low-level languages. SciPy is a collection of efficient tools for scientific programming, exposed as Python modules. Cython is a compiler for (an extended version of) Python which makes it possible to turn Python code in to highly efficient low-level extension modules, or to link Python code to existing low-level libraries.
Combining Python with packages such a SciPy and Cython, provides the programmer with the best of both worlds: the high productivity and ease of use of the Python language combined with the efficiency of low-level components.
This session introduces the Python language, highlighting its flexibility and expressivity and contrasting it to more static and low-level languages such as C++. It goes on to explore how highly performant programs can be developed in Python with the help of SciPy and Cython.
The ROOT software framework provides all the functionality needed to store and analyze large amounts of data in an efficient way.
We will provide an introduction to the ROOT system and its tools for data analysis and visualisation.
The main features of ROOT such as histogramming, data visualization, object I/O and advanced statistical analysis techniques will be presented. We will also introduce RooFit and RooStats, dedicated tools for advance fitting, which are currently used by the LHC experiments. The participants will have the opportunity also to directly practise and learn the ROOT tools via hands-on exercises in C++, covering some of the main functionality of the ROOT framework.
Sessions
Tue, Aug. 27, 13:30- 18:30
Wed, Aug 28, 10:50-18:30
Thu , Aug 29, 10:50-18:30
Abstract
In this workshop the participants will take on the role as security teams being responsible for the operational security of simulated grid sites running in a virtualized environment.
The sites will face attacks very similar to those seen in real life. The teams' task is to respond to these attacks and keep their services up and running as far as possible.
A running score will be kept, and at the end of the workshop the winners will receive fabulous prizes.
Maximum number of participants is 18.
Be prepared
A security incident always puts you in a challenging situation. You have to do many things correctly, quickly and in the correct order. What to do and when during incident response is usually formulated in an incident response procedure. We will start from the general Grid Incident Response Procedure available from EGI-CSIRT and discuss how to adapt it to local regulations.
Have a view of your site
Usually the information you initially get in a security incident will be relatively sparse and the amount of logs quite large. Therefore it is crucial to quickly get an initial overview of the problem, i.e. which systems are affected and which systems are at risk. Here we will discuss and use tools you can easily set up at your site as well, like a central syslog facility, a grid systems log analyzer and EGI CSIRT's vulnerability scanner Pakiti.
The heat is on
Now it is time to put your newly acquired knowledge to the test. As administrator of a simulated cluster, you will have to defend yourself against a determined attacker.
Hands-on forensics
Investigating a compromised system is a delicate situation. It is easy to lose crucial information if you are not careful enough. We will discuss several levels of volatile information and do and dont's in how to collect it. The creation and analysis of memory and disk images will be discussed.
Wrap-up, lessons learned
At the end of the work shop we will discuss the findings. The crucial point here is to find how the site was attacked and which steps could be taken to prevent this from happening again. This should result in some best practices on how to reduce the attack surface of your site.
OpenStack is currently one of the most evolving open IaaS solutions available. Every new release comes with a huge set of new features. It can be hard to hold pace with such changes. Starting from scratch also proves difficult due to the complexity of the several components interacting with each other.
The proposed training targets system administrators with little or no knowledge on cloud infrastructure, interested in learning how deploy and operate Openstack. The duration of the training will be 6 hours, distributed into one day. Main topics of the training will be:
If time is left, a short introduction on OpenStack compatible storage systems may be added (Swift/Ceph).
dCache is one of the most used storage solutions in the WLCG consisting of 94 PB of storage distributed world wide on 77 sites. Depending on the Persistency Model, dCache provides methods for exchanging data with backend (tertiary) Storage Systems as well as space management, pool attraction, dataset replication, hot spot determination and recovery from disk or node failures. Beside HEP specific protocols, data in dCache can be accessed via NFSv4.1 (pNFS) as well as through WebDav. The workshop includes theoretical sessions and practical hands-on sessions such as installation, configuration of its components, simple usage and monitoring. The basic knowledge of Unix systems is required. Please familiarise yourself with a Linux terminal and the peculiarities of a linux text editor (vi, emacs etc.).
Presented by:
Christian Bernardt ( DESY)
Christoph Anton Mitterer (Ludwig Maximilian University of Munich)
Oleg Tsigenov (RWTH Aachen)
Cesare Delle Fratte (Rechenzentrum Garching)