An Analysis Framework for KCDC

Frank Polgart DLC 08.06.2020

Motivation¶

KCDC already provides public datasets
Analysis Frameworks/Tools may be specialised and cumbersome to get to run
in case of large datasets: bring user to the data, instead of data to the user

Requirements¶

accessibility: we want people to actually use it, not another prestige project
usability: provide analysis framework
administration: the less effort, the better
don't reinvent the wheel, most of the work has already been done

Solution¶

accessibility: jupyterhub / notebooks
almost anyone how has done some data analysis has used a jupyter (formerly ipython-) notebook before
useability: python + ROOT
modern goto for data analysis; need ROOT to read files
administration: docker
light-weight, robust, fits our service needs
integration requires some custom code

this is by far a novel approach; low expected maintenance and reasonably "future-proof"¶

Features¶

from the users point of view¶

authenticates against KCDC, no extra account necessary
transparent access to datashop download area
interactive kernels for python and C++
loads of extensions available, for example this presentation module

from the providers point of view¶

standard components with active development and documentation
if need be, scales to many users & hosts with minimal effort
setup with docker compose is pretty much automatic
infrastructure description under version control for free
it's easy to add more datashops

Example¶

get some data!

In [3]:

import os
from zipfile import ZipFile
ZipFile('KASCADE_SmallDataSample_wA_runs_0877-7417_ROOT.zip').extractall()
os.listdir()

Out[3]:

['.ipynb_checkpoints',
 'KCDC_analyze_example.C',
 'slides.ipynb',
 'KASCADE_SmallDataSample_wA_runs_0877-7417_ROOT.zip',
 'info.txt',
 'events.root',
 'EULA.pdf']

Example

switch kernels and run some c++

In [1]:

.L KCDC_analyze_example.C

In [2]:

run()

Input file:events.root
KCDC-Entries read from files: 1080295
KCDCM-Entries:     1080295
Array Entries:     986577
Calor Entries:     250981
Grande Entries:    88259
General Entries:   1080295
KCDCN-Entries to be evaluated: 1080295
 processing event No: 0  of 1080295
 processing event No: 100000  of 1080295
 processing event No: 200000  of 1080295
 processing event No: 300000  of 1080295
 processing event No: 400000  of 1080295
 processing event No: 500000  of 1080295
 processing event No: 600000  of 1080295
 processing event No: 700000  of 1080295
 processing event No: 800000  of 1080295
 processing event No: 900000  of 1080295
 processing event No: 1000000  of 1080295
Entries survived:: 1080295 out of 1080295
general_id >0   :: 1080295
array_id >0     :: 986577
calorimter_id >0:: 250981
grande_id >0    :: 88259
(int) 0

Example¶

switch back to Python and look at the result

In [1]:

import ROOT
f = ROOT.TFile('KCDC_Test.root')
keys = [_.GetName() for _ in f.GetListOfKeys()]
c = ROOT.TCanvas("foo", "bar", 1920, 1080*len(keys)//4)
c.Divide(2,len(keys)//2)
c.SetLogy()
pad = 0
logspectra = ['h6202', 'h6302', 'h7202']
for key in keys:
    pad+=1
    c.cd(pad)
    if key in logspectra:
        ROOT.gPad.SetLogy()
    f.Get(key).Draw()

Welcome to JupyROOT 6.20/04

Example¶

In [2]:

c.Draw()

What's next¶

this is a tech-demo
needs a little work still to be made accessible to the plublic
maybe have more than one datashop (AstroDS?)
explore viability of build-in ipython clusters for analysis
improve with user feedback

An Analysis Framework for KCDC

Motivation¶

Requirements¶

Solution¶

this is by far a novel approach; low expected maintenance and reasonably "future-proof"¶

Features¶

from the users point of view¶

from the providers point of view¶

Example¶

Example

Example¶

Example¶

What's next¶

The End¶