29 August 2016 to 2 September 2016
Europe/Berlin timezone
We are currently updating to the new Indico 2 layout!

Apache Spark in Scientific Applications

1 Sep 2016, 13:00
Room 155 (FTU)

Room 155



Mirko Kämpf (Cloudera)


The workshop Spark in Scientific Applications covers fundamentale development and data analysis techniques using Apache Hadoop and Apache Spark. Beside an introduction into the theoretical background about Map-Reduce- and Bulk-Synchronous-Parallel processing, also the machine learning library MLlib and the graph processing framework GraphX are used. We work on sample data sets from Wikipedia, financial market data, and from a generic data generator. During the tutorial sessions we illustrate the Data Science Workflow and present the right tools for the right task. All practical exercises are well prepared in a pre-configured virtual machine. Participants get access to required data sets on a „one node pseudo-distributed“ cluster with all tools inside. This VM is also a starting point for further experiments after the workshop.

Presentation Materials

There are no materials yet.
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now