GridKa School 2015: Big Data, Virtualization and Modern Programming

Name: GridKa School 2015: Big Data, Virtualization and Modern Programming
Start: 2015-09-07T08:00:00+02:00
End: 2015-09-11T18:30:00+02:00
Location: No location set

Sep 7 – 11, 2015

Europe/Berlin timezone

Apache Spark: The next Generation of Hadoop Processing

Sep 10, 2015, 9:00 AM

40m

Gaede HS (30.22)

Gaede HS

30.22

Plenary Talks Plenary talks

Mr Mirko Kaempf (Cloudera)

Apache Spark is known as the "Next Generation Framework" of Hadoop based data processing. Why, and what Apache Spark offers to the scientific community is explained in this talk. The convergence of different analysis techniques into one flexible and highly efficient processing engine allows completely new interdisciplinary analysis methods beside cheap analysis prototypes. In this presentation I shown examples in Scala and Python. Beside fundamental techniques and the core features of Apache Spark we look into development practices and data analysis techniques. Therefore we recap the theoretical background about Map-Reduce- and Bulk-Synchronous-Parallel processing before I introduce the machine learning library MLlib and the graph processing framework GraphX. Apache Spark uses the concept of data frames, and allows SQL operations on data sets, after this presentation you know how this works and how you can save a lot of time. Finally, you can see how data can be collected and analyzed on the fly, using Spark Streaming.

Mr Mirko Kaempf (Cloudera)

Slides

GridKA-2015-MK-Apache_Spark_in_Scientific_Applciations-WorkingCopy-FINAL_Kopie.key

GridKA-2015-MK-Apache_Spark_in_Scientific_Applciations-WorkingCopy-FINAL_Kopie.pdf

GridKA-2015-MK-Apache_Spark_in_Scientific_Applciations-WorkingCopy-FINAL_Kopie.pptx

GridKa School 2015: Big Data, Virtualization and Modern Programming

Apache Spark: The next Generation of Hadoop Processing

Gaede HS

30.22

Speaker

Description

Primary author

Presentation materials

Choose timezone

GridKa School 2015: Big Data, Virtualization and Modern Programming

Speaker

Description

Primary author

Presentation materials