Speaker
Description
Reinforcement Learning methods typically require a large number of interactions with the environment to learn anything useful. This makes learning with sophisticated accelerator simulations difficult because of the total time required to train. On the other hand, learning with environments based on these accelerator codes is potentially very useful because they contain a lot of knowledge about accelerator systems. To ameliorate the problem of long wall-clock run times for these codes, we are using an Advantage Actor Critic (A2C) method to train an agent using the 2D accelerator code spiffe. Our end goal is to train an agent to control the FLEX accelerator, which is a candidate for FLASH radiation treatment. Herein we describe our progress, starting with learning at scale on simulations of the accelerator where we train an agent using one hundred simulations running in parallel.