The International GridKa School 2018 is one of the leading summer schools for advanced computing techniques in Europe. The school provides a forum for scientists and technology leaders, experts, and novices to facilitate knowledge sharing and information exchange. The target audience is different groups such as graduate and PhD students, advanced users as well as IT administrators. GridKa School is hosted by Steinbuch Centre for Computing (SCC) of Karlsruhe Institute of Technology (KIT). It is organized by KIT and the HGF Alliance "Physics at the Terascale".
Hands-on sessions and workshops give participants an excellent and unique chance to gain practical experience on cutting edge technologies and tools.
Plenary talks presented by experts cover theoretical aspects of school topics and focus on innovative features of data science on modern architectures.
Two social events are important parts of the school. Participants improve their networking and have fun by getting in touch with interesting people in a relaxed atmosphere.
The GridKa School is proudly supported by
Container technologies are rapidly becoming the preferred way to distribute, deploy, and run services by developers and system administrators. They provide the means to start a light-weight virtualization environment, i.e., a container, based on Linux kernel namespaces and control groups (cgroups). Such virtualization environment is cheap to create, manage, and destroy, requires a negligible amount of time to set-up, and provides performance equatable with the one of the host. Docker offers an intuitive way to manage containers by abstracting and automating the low-level configuration of namespaces and cgroups, ultimately enabling the development of an entire ecosystem of tools and products around containers.
This workshop covers aspects ranging from the basic concepts of Docker (e.g., set up of a Docker environment on your machine, run a container interactively, build-tag-publish images) to the deployment of complex service stacks using container clusters and orchestration software (e.g., Docker Compose and Kubernetes). The workshop will discuss in detail the concepts of network, volume, and resource management, demonstrating that containers are suitable for a variety of applications and their actual advantages over traditional virtual machines.
Note: The workshop includes hands-on exercises. To benefit to the maximum of the tutorial part, you should bring your own laptop and have Internet connection. You should also be comfortable working with the Linux terminal, editing files with common editors (e.g., vi, nano, emacs, etc.), and installing packages over the command line.
In this workshop, we will introduce the basics of programming in Go and then work our way up to concurrency programming with this relatively new language.
We'll start with the usual "Hello World" program, introduce functions, variables, packages and then interfaces.
Then, we will tackle the two main tools at the disposal of the Go programmer (colloquially known as a gopher): the channels and the goroutines.
This will be done by implementing a small peer to peer application transmitting text messages over the network.
The workshop wraps up with a whirlwind tour of scientific and non-scientific libraries readily available, and prospects/news about the next Go version.
People will have to install the Go compiler on their laptop.
The instructions to do so for their favorite operating system are detailed at:
To get a taste of what Go looks like and wet their feet, people can also follow the interactive, browser-based, installation-free tour from:
The course targets beginning Python developers and people familiar with scripting. The basics required to complete the course are covered, but ideally you already feel comfortable writing small scripts in any language. We highly recommend to use your own laptop (Linux, MacOS, Cygwin) for the exercises.
Besides offering fresh ideas and new programming concepts, Julia was mainly created to solve the two language problem.
The two language problem describes the common pattern of prototyping algorithms in an easy to use high level language and then reimplementing it in a fast language like C - doubling development costs and making updates and further development more complicated.
This also has led to a split in scientific computing: you work mainly in a scripting language, while all the performance critical libraries are unapproachable black boxes written in a more difficult language. For most users this is okay, but for developpers it makes growing the ecosystem more difficult and it's not as easy to engange users into contributing back to the core library.
Julia solves this with a sophisticated compiler model which manages to combine the usability of dynamic scripting languages with the performance of low level languages.
In this workshop I will introduce the basic mechanisms of how Julia works and will teach some fun programming examples showing how to use Julia's type system, meta programing and how to make any Julia program run as fast as highly optimized C - all while being at least as readable as python code!
I will also show some more advanced examples which will explain how Julia can offer completely new possibilities for library developpers, by having high performance libraries written in a dynamic language.
One of those examples is how to seamlessly move your code to the GPU and do e.g. automatic differentiation on the GPU and CPU alike without loosing any performance.
Data scientists must manage analyses that consist of multiple stages, large datasets and a great number of tools, all the while maintaining reproducibility of results. Amongst the variety of available tools to undertake parallel computations, Pachyderm is an open-source workflow-engine and distributed data processing tool that fulfils these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem. In this workshop you will learn how to:
This workshop is an introduction to profiling a system or individual programs. We will concentrate on profiling to understand aspects of program performance. All discussion and examples are presented with Linux in mind, and some Intel specific information will be given.
Topics include:
Overview of
Profiling & Performance: types and typical metrics
Hands on exercises using profiling tools
It will be assumed the participants are already familiar with Linux. The application level portion of the profiling exercises will be done with C or C++ programs.
Writing maintainable software is a prerequisite in many fields. Especially when working in projects with many members it is essential to
However, the goals of maintainable software are not only relevant when working in teams, but also in private projects. This makes the topic relevant for anybody that needs to write and maintain software.
Based on experiences from projects in academia and industry, this tutorial introduces tools and concepts to enable maintainable software projects in collaborative environments. While we try to give a broad overview on different topics, we also flexibly provide in-depth information depending on your feedback during the course. We cover topics such as version control and organisation of software with git, concepts of unit testing and test-driven development, tools supporting continuous integration as well as the integration into wikis and ticket systems.
Throughout this tutorial you will learn how to efficiently integrate different tools and concepts to enable maintainable software. After the course, you will have a basic setup that can be adapted to your specific needs.
This course is a hands-on tutorial and requires basic knowledge in Python programming. For best learning experiences and an overview on encompassing software development processes, we suggest the combined participation in the workshop Introduction to Python and Collaborative Software Development.
Python provides a rich ecosystem of open-source software for mathematics, science, and engineering. This tutorial will introduce you to the fundamental packages of the SciPy stack.
You will learn how-to: perform fast numerical calculations in N dimensions using NumPy, analyze your data using Pandas, and visualize the results using Matplotlib. The exercises will be performed in the Jupyter Notebook environment, which you can access through your web browser.
You will need a tablet or a laptop and basic knowledge of the Python programming language.
Machine learning, and especially deep learning, is one of the current hot topics in computer science and engineering. It has not only experienced tremendous advancements in its theoretical foundations during the last few years, but is now also the state-of-the-art method in a broad range of applications. In this course, you will learn the
Using small to mid-sized application use cases from science and computer vision you are going to experience how to put the gained knowledge into practice.
As the machine learning framework of choice, we are going to use the TensorFlow library as computational back-end to the deep learning library Keras in the Python programming language (some prior knowledge is necessary). Using modern GPU computing resources in a cluster computing system, we are going to have a look at typical machine learning applications, such as classification problems and numerical regression analysis.
Please make sure to bring your own laptop and refresh you basic knowledge on vectors and matrices. We are looking forward to having you!
A single plain C file is sufficient to express an embedded program.
As the Arm Cortex-M architecture is designed with C-Code in mind, no assembly level system bring up code is required. This workshop will teach you how to program C code on top of a bare metal CPU without an operating system or support libraries like libc.
It will give you insight on how linkers can be configured to run your program at the right location and placing data. We will use the free arm-gcc toolchain and related tools form the toolchain to analyze the program on assembly level to understand better how C-language is mapped to machine code depending on the chosen compiler optimization level and linker settings.
The workshop will further introduce you to how low level features like stacks and interrupts are used and how they map onto Arm Assembly code. One of the purposes of this course is to lay out the programming methods for talking to hardware in a minimal configuration. Our broader target is a better understanding of interaction with low level hardware and toolchains for embedded systems.
Last, but not least we will present debug techniques for low level / OS-development and might talk about security features of the used microcontroller platform.
In case you’re interested in reading material on the topic, we recommend “The Definitive Guide to ARM® Cortex®-M3 and Cortex®-M4 Processors 3rd Edition” – but it will be by now means required for participating in this course.
An internet connected laptop is required for participating in this workshop - please install the latest version of Docker on your system and verify its running and that your system is updated. We will provide a Docker-based Linux environment for you with a pre-installed arm-gcc toolchain.
Basic knowledge of the C-programming language is required.
In this workshop, the students will (a) learn how to efficiently use
relational and non-relational databases for modern large-scale
scientific experiments, and (b) how to create database workflows suitable for analytics and machine learning.
First, the focus of the workshop is to teach efficient, safe, and
fault-tolerant principles when dealing with high-volume and
high-throughput database scenarios. This includes, but is not limited to, systems such as PostgreSQL, Redis, or ElasticSearch. Topics include query planning and performance analysis, transactional safety, SQL injection, and competitive locking.
Second, we focus on how to actually prepare data from these databases to be usable for analytics and machine learning frameworks such as Keras.
An intermediate understanding of Python, SQL, and Linux shell scripting is recommended to follow this course. An understanding of machine learning principles is not required.
In this IT security workshop the participants will change ends and take the role of a hacker attacking servers and services within a prepared environment.
During the workshop we will play with different web applications waiting to be hacked. Many web apps have striking bugs that threaten the data of millions of users. You will learn about SQL injection, scripting issues, request forgery and more. We will also explore and use the Metasploit Framework, a tool that aids hackers at choosing and running exploits against one or many targets.
Every part of the workshop starts with a condensed introduction of the basics of the topic. After that, it's your turn! You have the opportunity to replay the demos and explore further techniques and possibilities of the exploit tools. Finally, you can attack and try to "pwn" servers with varying levels of difficulty in the lab environment. At the end of every unit we will discuss your findings and experiences together.
You should be familiar with the Unix command line and the concept of manpages. A basic understanding of common web technologies and the ability to read scripting languages is helpful. Knowledge of TCP/IP and network services is also recommended.
OpenMP (Open Multi-Processing) is a programming interface for shared memory parallelization on multiprocessor computers.
The Message Passing Interface (MPI) is a comunication standard describing the exchange of messages for distributed memory parallelization on parallel computers.
Both programming concepts will be introduced with simple examples.
In this course, you will learn to write simple parallel programs using both interfaces.
Since this course is conducted on Linux systems, you should be able to use the command line and have some basic programming skills in C/C++.
OpenACC is a directive-based programming model for highly parallel systems, which allows for automated generation of portable GPU code. In this tutorial, we will get to know the programming model with examples, learn how to use the associated tools environment, and incorporate first strategies for performance optimization into our programs. Finally, we will integrate OpenACC with other GPU programming strategies.
This Quantum Computing tutorial will enabling the participants to access and run calculations on real quantum computers from IBM. The course gives an introduction into the IBM Q Experience as well as to the open source Quantum Information and Science Kit (Qiskit), an open-source quantum computing framework for leveraging today's quantum processors and conducting research. Basic knowledge of Quantum Mechanics, Linear Algebra and Python is assumed (but not mandatory).
Pandas is a Python package that provides data structures to work with heterogenous, relational/tabular data. It provides fundamental building blocks for a powerful and flexible data analysis. Pandas provides functionality to load a wide set of data formats, manipulate the resulting data and also visualize it using various plotting frameworks. We will show in the workshop how to clean and reshape data in Pandas and use the concept of split-apply-combine to do exploratory analysis on it. Pandas provides powerful tooling to do data analysis on a single machine and is mostly mostly constrained to a single CPU. To parallelize and distribute these tasks, one can use Dask.
Dask is a flexible tool for parallelizing Python code on a single machine or across a cluster. We can think of dask at a high and a low level: Dask provides high-level Array, Bag, and DataFrame collections that mimic NumPy, lists, and Pandas but can operate in parallel on datasets that don't fit into main memory. Dask's high-level collections are alternatives to NumPy and Pandas for large datasets. In the low level, Dask provides dynamic task schedulers that execute task graphs in parallel. These execution engines power the high-level collections mentioned above but can also power custom, user-defined workloads. In the tutorial, we will cover the high-level use of dask.array and dask.dataframe.