



# A Fast Track Reconstruction via GNN on FPGA for PANDA

### by Greta Heine on Oct 24, 2022



KIT – The Research University in the Helmholtz Association



### www.kit.edu

VECINI



### **Motivation**

accurate tracking of particle trajectories in HEP collider experiments is crucial to ensure quality of novel particle detection

worse than quadratically in the number of detector hits

Graph Neural Networks (GNNs) have already been successfully applied to this task, e.g. by DeZoort et al., 2021 and W. Esmail, 2022

reduce computational cost by implementation on field-programmable gate arrays (FPGA)



- state-of-the-art track reconstruction algorithms based on Kalman filters scale computationally





# **PANDA** (anti-Proton ANnihilation at DArmstadt)

1 of 4 experiments at FAIR (GSI Darmstadt) which is currently under construction

- 1.5 15 GeV antiproton beam at fixed target
- investigation of hadron physics in the charm and multi-strange hadron sectors





### **PANDA Physics Program**









# **PANDA Forward Tracking System (FTS)**

three pairs of planar straw tube tracking stations FT1 to FT6

FT1 and FT2 in front of, FT3 and FT4 inside and FT5 and FT6 behind the dipole magnet

each straw tube with diameter of 10mm

each tracking station consists of 4 double-layers where the two intermediate double-layers are inclined at  $+5^{\circ}$  and  $-5^{\circ}$ 



Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA





# Generation of MC training Data via PandaRoot

PANDARoot: PANDA experiment's simulation and reconstruction software package

full Panda simulation MC event generator and transport through the detector via Geant

n = 30.000

generation of 3 muons (few secondary particles) and 3 anti-muons (crossing tracks)

particle gun box event Generator: 3 muons and 3 anti-muons in forward direction with uniform distribution of p = [1,10GeV],  $\theta$  = [0.5°,10°] and  $\phi$  = [0,2 $\pi$ ]

TString inputGenerator = "box:type(13,3):p(1,10):tht(0.5,10):phi(0,360)







### Generation of MC training Data via PANDARoot



Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA

![](_page_6_Picture_5.jpeg)

![](_page_6_Picture_7.jpeg)

# Generation of MC training Data via PANDARoot

![](_page_7_Picture_1.jpeg)

![](_page_7_Figure_2.jpeg)

Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA

![](_page_7_Picture_6.jpeg)

![](_page_7_Picture_8.jpeg)

### **Raw ROOT Data Input**

Conversion from Root to pandas dataframe using uproot features: event ID, hit ID, x, z, isochrone radius, momentum pz, MC particle ID, layer ID

![](_page_8_Figure_2.jpeg)

![](_page_8_Picture_5.jpeg)

![](_page_8_Figure_8.jpeg)

![](_page_8_Picture_11.jpeg)

![](_page_8_Picture_12.jpeg)

# **Preprocessing of Training Data**

omit skewed layers and apply layer mapping for new adjacent layer labels

in order to avoid looping tracks (curlers) and tracks leaving the detector early, only consider particles with

 $p_7 > 0.001$  GeV/c (85.4% of curlers removed and only 1.7% of non-curlers lost)

additional same-layer filter and hit index reordering

![](_page_9_Figure_5.jpeg)

![](_page_9_Picture_8.jpeg)

# **Graph Building**

GNNs are a central method of geometric deep learning based on a graph representation of data

![](_page_10_Figure_3.jpeg)

Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA

![](_page_10_Picture_6.jpeg)

![](_page_10_Picture_8.jpeg)

# **Graph Building**

![](_page_11_Figure_1.jpeg)

![](_page_11_Picture_4.jpeg)

![](_page_11_Figure_5.jpeg)

![](_page_11_Picture_8.jpeg)

# **GNN** Architecture

Interaction network model by <u>Battaglia et al.</u> with PyTorch Geometric

**Edge block R1** (linear feed forward network with ReLu activation) for edge features

**Node block O** (linear feed forward network with ReLu activation) for hit features

**final Edge block R2** with sigmoid activation,

![](_page_12_Figure_5.jpeg)

![](_page_12_Picture_8.jpeg)

| 1-dim | output |
|-------|--------|
|-------|--------|

InteractionNetwork(node\_dim: 2, edge\_dim: 2, hidden\_size: 8)

| Modules                      | Parameters   |
|------------------------------|--------------|
| R1.layers.0.weight           | 48           |
| R1.layers.0.bias             | 8            |
| R1.layers.2.weight           | 64           |
| R1.layers.2.bias             | 8            |
| R1.layers.4.weight           | 16           |
| R1.layers.4.bias             | 2            |
| 0.layers.0.weight            | 32           |
| 0.layers.0.bias              | 8            |
| 0.layers.2.weight            | 64           |
| 0.layers.2.bias              | 8            |
| 0.layers.4.weight            | 16           |
| 0.layers.4.bias              | 2            |
| R2.layers.0.weight           | 48           |
| R2.layers.0.bias             | 8            |
| R2.layers.2.weight           | 64           |
| R2.layers.2.bias             | 8            |
| R2.layers.4.weight           | 8            |
| R2.layers.4.bias             | 1            |
| ++<br>Total Trainable Params | ++<br>s: 413 |

![](_page_12_Picture_17.jpeg)

### **Training and Validation**

divide data into 80% training, 10% validation, 10% test data

**Optimizer:** Adam optimizer, **loss**: Binary Cross Entropy

**Epochs:** max. 40 (early stopping after 10 epochs without improvement)

learning rate Ir =  $6.5 \cdot 10^{-4}$ , learning rate decay  $\gamma = 0.86$ , step size = 3 (via optuna hyper parameter search)

![](_page_13_Figure_5.jpeg)

Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA

![](_page_13_Picture_8.jpeg)

![](_page_13_Picture_14.jpeg)

![](_page_13_Picture_15.jpeg)

## **GNN Edge Classification**

### classification results

![](_page_14_Figure_2.jpeg)

![](_page_14_Figure_4.jpeg)

Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA

![](_page_14_Picture_7.jpeg)

![](_page_14_Picture_9.jpeg)

![](_page_14_Picture_10.jpeg)

# **Tracklet Finding**

**tracklets**: track candidates for each of the 3 segments (max. length = 8 hits) tracklets computed via Python package NetworkX

![](_page_15_Figure_2.jpeg)

![](_page_15_Picture_5.jpeg)

$$\begin{aligned} \text{tracklet purity} &= \frac{1}{N_{\text{tracklets}}} \sum_{N_{\text{tracklets}}} \frac{N_{\text{majority particle}}}{N_{\text{tracklet hits}}} \\ \text{MC coverage} &= \frac{1}{N_{\text{tracklets}}} \sum_{N_{\text{tracklets}}} \frac{N_{\text{majority particle}}}{N_{\text{MC}}} \\ \end{aligned}$$
$$\begin{aligned} \text{fully found rate} &= \frac{N_{\text{full}}}{N_{\text{tracklets}}} \end{aligned}$$

ghost rate = 
$$\frac{N_{\text{ghost}}}{N_{\text{tracklets}}}$$

![](_page_15_Picture_11.jpeg)

![](_page_15_Picture_12.jpeg)

# FPGA Technology

programmable digital integrated circuits

- FPGAs support high data throughput and low latency data processing at high flexibility
- common in HEP DAQ systems
- general tradeoff between resource usage and latency/ throughput requirements
- design optimisations needed to meet performance requirements
- building blocks: lookup tables (LUTs), flip-flops (FFs), block random access memories (BRAMs) and digital signal processors (DSPs)

![](_page_16_Picture_9.jpeg)

![](_page_16_Figure_10.jpeg)

![](_page_16_Picture_13.jpeg)

# Highlevel Synthesis for Machine Learning (hls4ml)

![](_page_17_Figure_1.jpeg)

![](_page_17_Picture_4.jpeg)

- automatic translation of ML models to hardware level
- based on Vivado HLS
- fast prototyping by automated workflow
- several configuration parameters

![](_page_17_Picture_11.jpeg)

![](_page_17_Picture_12.jpeg)

# **Working Environment and Benchmark Model**

Xilinx Zyng® UltraScale+ MPSoC FPGA (part number xczu11eg-ffvc1760-2-e)

to graphs with 49 nodes and 98 edges)

benchmark model: node and edge dimension  $D_{node} = D_{edge} = 2$ , number of nodes  $N_{\text{nodes}} = 28$ ,  $N_{\text{edges}} = 56$ , reuse factor RF = 8

![](_page_18_Picture_6.jpeg)

- $\square$  no variable-size input data possible  $\rightarrow$  truncation and zero-padding using 95% rule (corresponds)

![](_page_18_Picture_9.jpeg)

# **Design Optimization: Quantization**

bit widths directly affect resource usage

![](_page_19_Figure_4.jpeg)

![](_page_19_Picture_7.jpeg)

# **Design Optimization: Compression**

reduction of model parameter number

number of model parameters strongly depends on the number of hidden nodes/ neurons

![](_page_20_Figure_3.jpeg)

![](_page_20_Picture_6.jpeg)

![](_page_20_Picture_10.jpeg)

# **Design Optimization: Pipelining**

key advantage of FPGAs: throughput acceleration by parallelisation and pipelining

total latency = iteration latency + II  $\cdot$  (number of functions -1)

pipelining includes task parallelism, pipelining within a run, of runs or within a task

![](_page_21_Figure_4.jpeg)

![](_page_21_Picture_7.jpeg)

![](_page_21_Picture_13.jpeg)

# **Design Optimization: Pipelining**

key advantage of FPGAs: throughput acceleration by parallelisation initiation interval II equals the hls4ml internal reuse factor RF

![](_page_22_Figure_2.jpeg)

![](_page_22_Picture_5.jpeg)

![](_page_22_Picture_9.jpeg)

![](_page_22_Picture_10.jpeg)

### **Design Optimization: Graph Dimensions**

ratio of  $N_{edges} = N_{nodes}$ 

![](_page_23_Figure_2.jpeg)

![](_page_23_Picture_6.jpeg)

### **Classification Performance**

graph size  $N_{nodes} = 49$ ,  $N_{edges} = 98$ , number of neurons  $N_{neurons} = 6$ 

precision = ap\_fixed<16,8>

**problem**: this configuration exceeds resources

![](_page_24_Figure_4.jpeg)

Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA

![](_page_24_Picture_7.jpeg)

![](_page_24_Picture_10.jpeg)

![](_page_24_Picture_11.jpeg)

# **Design Optimization**

goal: find a design that satisfies latency, resource and classification requirements

hands-on adjustment using Vivado HLS design directives (pragmas) including ARRAY\_PARTITION, UNROLL, PIPELINE, DATAFLOW

| # | RF | Description                                            | Latency<br>[cycles] | ll<br>[cycles] | DSP<br>[%] | LUT<br>[%] | FF<br>[%] | BR<br>[% |
|---|----|--------------------------------------------------------|---------------------|----------------|------------|------------|-----------|----------|
| 1 | 1  | throughput-optimized design                            | 103                 | 103            | 546        | 386        | 88        |          |
| 2 | 8  | throughput-optimized design                            | 124                 | 124            | 63         | 193        | 46        | Z        |
| 3 | 16 | throughput-optimized design                            | 139                 | 139            | 34         | 175        | 42        | Z        |
| 4 | 32 | throughput-optimized design                            | 173                 | 173            | 19         | 169        | 40        | Z        |
| 5 | 16 | edge-aggregation without array partitioning            | 384                 | 247            | 34         | 85         | 17        | Z        |
| 6 | 16 | edge-aggregation with array partitioning factor = 2    | 311                 | 247            | 34         | 85         | 17        | Z        |
| 7 | 16 | # 5 with dataflow design                               | 290                 | 247            | 34         | 104        | 18        | 11       |
| 8 | 16 | loop unrolling with factor parallelization factor = 16 | 85                  | 85             | 34         | 169        | 20        | 4        |
| 9 | 16 | #6 with 2ns clock period                               | 345                 | 247            | 34         | 87         | 23        | 4        |

Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA

![](_page_25_Picture_6.jpeg)

IPE - Institute for Data Processing and Electronics

# 0

# Conclusion

successful application of a GNN approach on charged particle tracking

edge classification with true edge efficiency of 98.9% at true edge purity of 63.0%

Interaction network architecture with only 275 parameters used

tracklet finding with up to 77% of full found tracklets

real-time analysis

![](_page_26_Picture_8.jpeg)

- application on FPGA using 34% of DSPs and 85% of LUTs at a total latency of 0.99µs enabling

![](_page_26_Picture_15.jpeg)

### Outlook

include skewed detector layer hits for 3D information

include next-to-next layer and same-layer edges

extend to full track finding and reconstruction

Investigate more sophisticated FPGA design optimization methods

Implement on Versal board with much more resources using Vitis AI

- integration in PandaRoot
- application on real FPGA

![](_page_27_Picture_10.jpeg)

![](_page_27_Picture_13.jpeg)

# THE END

![](_page_28_Picture_3.jpeg)

![](_page_28_Picture_6.jpeg)

![](_page_28_Picture_7.jpeg)

# BACKUP

Greta Heine – A Fast Track Reconstruction via GNN on FPGA for PANDA

![](_page_29_Picture_3.jpeg)

![](_page_29_Picture_5.jpeg)

![](_page_29_Picture_6.jpeg)

# **Definition of metrics**

![](_page_30_Figure_1.jpeg)

![](_page_30_Picture_6.jpeg)

![](_page_30_Picture_9.jpeg)

![](_page_30_Figure_10.jpeg)

### PandaRoot

![](_page_31_Figure_1.jpeg)

![](_page_31_Picture_4.jpeg)

![](_page_31_Figure_5.jpeg)

Particle Identification

Analysis

![](_page_31_Figure_8.jpeg)

ETP – Institute of Experimental Particle Physics

![](_page_31_Figure_11.jpeg)

![](_page_31_Figure_12.jpeg)

### Vivado/ Vivado HLS

Vivado is an integrated design environment of Xilinx device systems

C to HDLs

Vivado HLS design workflow consists of C synthesis, C simulation, C/RTL Co-Simulation, package and export RTL design as Vivado IP block to Vivado

![](_page_32_Picture_6.jpeg)

### Vivado HLS is a compiler enabling conversion of high-level languages such as C, C++ and System

![](_page_32_Figure_9.jpeg)

### **Number of GNN Parameters**

number of model parameters for each block  $N_B = N_w + N_b = (D_{in} + D_{hidden} + D_{out}) \cdot D_{hidden} + (2D_{hidden} + D_{out})$ relational model R  $R_1(2 \cdot D_{node} + D_{edge}, D_{edge}, N_{hidden})$ object model O  $O(D_{node} + D_{edge}, D_{node}, N_{hidden})$ relational model output block  $R_2(2 \cdot D_{node} + D_{edge}, 1, N_{hidden})$ total number of parameters  $N_{\rm IN} = N_{\rm w} + N_{\rm h}$ with  $N_w = D_{\text{hidden}} \cdot (6D_{\text{node}} + 4D_{\text{edge}} + 3D_{\text{hidden}} + 1)$  $N_b = 6D_{\text{hidden}} + D_{\text{edge}} + D_{\text{node}} + 1$ 

![](_page_33_Picture_4.jpeg)

| Modules            | Parameters |  |
|--------------------|------------|--|
| +                  | ++         |  |
| R1.layers.0.weight | 48         |  |
| R1.layers.0.bias   | 8          |  |
| R1.layers.2.weight | 64         |  |
| R1.layers.2.bias   | 8          |  |
| R1.layers.4.weight | 16         |  |
| R1.layers.4.bias   | 2          |  |
| O.layers.0.weight  | 32         |  |
| 0.layers.0.bias    | 8          |  |
| 0.layers.2.weight  | 64         |  |
| 0.layers.2.bias    | 8          |  |
| O.layers.4.weight  | 16         |  |
| 0.layers.4.bias    | 2          |  |
| R2.layers.0.weight | 48         |  |
| R2.layers.0.bias   | 8          |  |
| R2.layers.2.weight | 64         |  |
| R2.layers.2.bias   | 8          |  |
| R2.layers.4.weight | 8          |  |
| R2.layers.4.bias   | 1          |  |

**IPE - Institute for Data Processing and Electronics** 

![](_page_33_Picture_10.jpeg)

size: 8)

![](_page_33_Picture_12.jpeg)

![](_page_33_Picture_13.jpeg)

### hls4ml Compilation Times

### challenge: long compilation times

![](_page_34_Figure_2.jpeg)

![](_page_34_Picture_5.jpeg)

![](_page_34_Picture_7.jpeg)

# **Code available in Git Repo**

### git@github.com:grheine/IN\_hls4ml.git

![](_page_35_Figure_2.jpeg)

![](_page_35_Picture_7.jpeg)

![](_page_35_Picture_8.jpeg)

![](_page_35_Picture_11.jpeg)

![](_page_35_Picture_12.jpeg)