Aug 27 – 31, 2018
KIT, Campus North, FTU
Europe/Berlin timezone

Scalable and reproducible workflows with Pachyderm

Aug 28, 2018, 1:00 PM
5h
157 (FTU)

157

FTU

Workshop Tutorials

Speaker

Jon Ander Novella

Description

Data scientists must manage analyses that consist of multiple stages, large datasets and a great number of tools, all the while maintaining reproducibility of results. Amongst the variety of available tools to undertake parallel computations, Pachyderm is an open-source workflow-engine and distributed data processing tool that fulfils these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem. In this workshop you will learn how to:

  • create a simple local Kubernetes infrastructure,
  • install and interact with Pachyderm and
  • implement a scalable and reproducible workflow using containers.

Presentation materials