- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
We are pleased to announce RL4AA'25, the third instalment of the workshop series organised by the Reinforcement Learning for Autonomous Accelerators (RL4AA) Collaboration. After two very successful workshops in 2023 and 2024 in Karlsruhe and Salzburg, we are excited to announce the next workshop for 2025. RL4AA'25 will be hosted by DESY in the beautiful port city of Hamburg, Germany.
The workshop will bring together experts from the fields of machine learning, accelerator physics, and high-performance computing to discuss the latest developments in the field of reinforcement learning for autonomous accelerators. The workshop will feature invited talks, contributed talks, and poster sessions, as well as a panel discussion on the future of autonomous accelerators.
RL4AA welcomes seasoned RL practitioners as well as newcomers. We are doing our best to make sure there is something for everyone, from introductory tutorials to advanced research talks.
Note also that this year’s workshop is organised in coordination with 5th ICFA Beam Dynamics Mini-Workshop on Machine Learning for Particle Accelerators (MaLAPA) workshop at CERN. Both workshops are organised such that you can attend them one after the other.
We are looking forward to welcoming you to Hamburg in 2025!
For more than half a decade, RadiaSoft has developed machine learning (ML) solutions to problems of immediate, practical interest in particle accelerator operations. These solutions include machine vision through convolutional neural networks for automating neutron scattering experiments and several classes of autoencoder networks for de-noising signals from beam position monitors and low-level RF systems in the interest of improving and automating controls. As active deployments of our ML products have taken shape, one area which has become increasingly promising for future development is the use of agentic ML through reinforcement learning (RL). Leveraging our substantial suite of ML tools as a foundation, we have now begun to develop an RL framework for achieving higher degrees of automation for accelerator operations. Here we discuss our RL approaches for two areas of ongoing interest at RadiaSoft: total automation of sample alignment at neutron and x-ray beamlines, and automated targeting and dose delivery optimization for FLASH radiotherapy. We will provide an overview of both the ML and RL methods employed, as well as some of our early results and intended next steps.
We present design considerations and challenges for the fast machine learning component of a third-order resonant beam extraction regulation system being commissioned to deliver steady beam rates to the mu2e experiment at Fermilab. Dedicated quadrupoles drive the tune toward the 29/3 resonance each spill, extracting beam at kV multiwire septa. The overall Spill Regulation System consists of (1) a “slow” process using ~100-spill averages to adjust the base quad ramp infrequently, (2) a feedforward harmonic content compensator, and (3) the “fast” ML agent reacting during each ongoing spill with on-the-fly additive corrections to the sum of (1) and (2).
We have demonstrated improved beam-rate steadying for a fast ML agent compared to a PID controller using a quasi-physical spill simulation, and demonstrated distillation of that simulation into a predictive surrogate model. Current work includes a data-and-training pipeline to generate data-aware surrogates with real-world dynamics, even as the dynamics shift unpredictably. The surrogates are to act as RL environments against which to train our fast ML control agents before deploying them on FPGA in the live system. Further current efforts focus on modeling and controlling beam loss around the storage ring, understanding additional available hardware inputs to the model, and the interplay of these with beam-steadying performance.
The slow extracted beams at the CERN Super Proton Synchrotron (SPS) are transported over several 100 m long transfer lines to three targets in the North Area Experimental Hall. The experiments require to eliminate intensity fluctuations over the roughly 5 s particle spill and hence to debunch the extracted beams. In this environment, secondary emission monitors (SEMs) have to replace the conventional beam position monitoring systems that rely on RF structure. Such monitors can be used to infer the intensity difference between two split foils, but do not readily provide position readings. Moreover, when the beam ends up on one of the foils, determining the appropriate corrector magnet settings remains a challenging task, as it is not possible to directly infer the beam deviation from the SEMs. In such scenarios, traditional trajectory control algorithms fail.
This paper summarises the application of reinforcement learning (RL) to successfully correct the beam trajectory using SEM readings. The RL policy is learnt offline in simulation, and can be successfully transferred to the real environment. Moreover, the RL policy is also tested in scenarios where the beam is lost in the line, and threading actions are needed. Results of the application of the RL policies in the real transfer line, and the different tests carried out in simulation are presented.
In BNL’s Booster, the beam bunches can be split into two or three smaller bunches to reduce their space-charge forces. They are then merged back after acceleration in the Alternating Gradient Synchrotron (AGS). This acceleration with decreased space-charge forces can reduce the final emittance, increasing the luminosity in RHIC and improving proton polarization. Parts of this procedure have already been tested and are proposed for the Electron-Ion Collider (EIC). The success of this procedure relies on a series of RF gymnastics to merge individual source pulses into bunches of suitable intensity. In this work, we explore an RF control scheme using reinforcement learning (RL) to merge bunches, aiming to dynamically adjust RF parameters to achieve minimal longitudinal emittance growth and stable bunch profiles. Initial experimental results and ongoing system developments are presented and discussed.
Aging of the stripper foil and unexpected machine shutdowns are the primary causes for reduction of the injected intensity from CERN’s Linac3 into the Low Energy Ion Ring (LEIR). As a result, the set of optimal control parameters that maximizes beam intensity in the ring tends to drift, requiring daily adjustments to the machine control settings. This paper explores the design of a Reinforcement Learning (RL) based auto-pilot that compensates for the drift of parameters and maintains a optimal beam intensity in the LEIR ring. It observes Time of Flight (ToF) measurements of the ion beam in the linac and Schottky signals in the ring to act on the relevant control knobs.
This autonomous agent is pre-trained on a data-driven surrogate model built from historical exploration of the high dimensional parameter space. A comparison of the performances of several RL algorithms will be done on different surrogate models designs to evaluate future design of the operational autopilot. This work holds promise for pre-training offline RL agents through data-driven surrogate models on tasks that are too complex or computationally expensive to simulate.
The complexity of the GSI/FAIR accelerator facility demands a high level of automation in order to maximize time for physics experiments. Accelerator laboratories world-wide are exploring a variety of techniques to achieve this, from classical optimization to reinforcement learning.
Geoff, the Generic Optimization Framework & Frontend, is an open-source framework that harmonizes access to automation techniques and simplifies the transition towards them. It is maintained as part of the EURO-LABS project in cooperation between CERN and GSI.
We report on results that have been achieved with Geoff at GSI in 2024. The multi-turn injection of the SIS18 synchrotron has been optimized via multi-objective Bayesian optimization for the first time and a Pareto front has been built from real data. The existing optimization has also been analyzed in more detail. In addition, the use of a data-driven Gaussian Process Model Predictive Control (GP-MPC) framework has been studied in simulation.
We have also successfully used Geoff for beam centering and focusing at the GSI Fragment Separator (FRS). This task involved communication with multiple controls systems and between the different networks of the accelerator and the experiment complex. Algorithms as varied as track classification, distribution fitting and black-box optimization were used in tandem and demonstrate the flexibility of Geoff in the face of non-trivial user requirements.
In addition, Geoff has undergone a major update in 2024 that brings it in line with the latest developments of numerical and machine-learning software in the Python ecosystem.
Manual alignment of optical systems can be time consuming and the achieved performance of the system varies depending on the operator doing the alignment. A reinforcement learning approach using the PPO algorithm was used to train agents to align simple two-mirror optical setups, as well as a full regenerative laser amplifier. The goal is to produce agents that can reproducibly align the setup faster than a human and can correct long-term drifts in laser energy (time scale of approx. one hour) during operation. The work is still ongoing. Agents have been successfully implemented on hardware in the two-mirror setup, showing “super-human” performance in alignment time. The agents successfully “learn” to handle a significant amount of mechanical backlash in the used stepper motors and mirror mounts. Currently, the necessary hardware is being installed on a regenerative amplifier and agents are being further developed for this use case.
Recent advances in fine-tuning large language models (LLMs) with reinforcement learning (RL) techniques have demonstrated their ability to generalize, unlike the often-used Supervised Fine-Tuning (SFT).
Many aspects of particle accelerators, such as beam parameters, have well-defined objectives, making them ideal candidates for RL-driven optimization.
In this work, we explore the capabilities of current open-source LLMs fine-tuned to understand the peculiarities of the ALS architecture.
We identify several optimization objectives that can be beneficial and create a small dataset to benchmark our hypothesis.
Our goal is to demonstrate how RL can enhance an LLM’s ability to interpret accelerator states, optimize performance, and provide intelligent insights into beam dynamics.
Deep reinforcement learning (DRL) has demonstrated great potential for controlling and regulating complex real-world systems such as nuclear fusion reactors such as tokamaks and particle accelerators. Another promising application is the DRL-based control of liquid-propellant rocket engines (LPREs), which have been a focus of research at the German Aerospace Center (DLR) for the past six years. LPREs are safety-critical systems, where reliability and robustness are of utmost importance. A key difficulty in these systems is the discrepancy between simulation models and real-world behavior, combined with limited availability of real data. An ideal DRL-based controller should be capable of adapting to such errors to ensure robustness and reliability.
To address these challenges, we present a benchmark for LPRE control designed to evaluate DRL-based control strategies. This benchmark includes simulation software calibrated with experimental data, a dataset for fine-tuning, and the ability to simulate representative errors in both sensors and the system itself. The findings from this benchmark are expected to be transferable to particle accelerators and similar systems.
The benchmark will be made freely accessible to the RL community, fostering further research in robust control applications. Our poster outlines the essential steps for deploying DRL controllers on real rocket engines and provides an overview of the benchmark’s components and capabilities.
Reinforcement learning (RL) is gaining more and more importance in the field of machine learning (ML). One subfield of RL is Multi-Agent RL (MARL). Here, several agents learn to solve a problem simultaneously rather than a single agent. For this reason, this approach is suitable for many real-world problems.
Since learning in a multiple agent scenario is highly complex, further conflicts can arise in addition to the difficulties in single-agent RL. These are, for example, scalability problems, non-stationarity or ambiguous learning goals.
To explore the difficulties of MARL, we have implemented the environment for a wireless network communication problem. There we look at the assignment of frequency resources to guarantee a reliable communication. Thus, different devices in a given area should learn on their own, which frequency they can use without disturbing the communication of other devices. As an overlap of frequencies can lead to a lack of communication, it is important that all devices select their own frequencies.
To accomplish this task, all devices that need to communicate become agents. The given area in which their communication takes place, the so-called communication cell, is the environment of the MARL problem. At each time step, every device chooses a communication channel (a frequency band) that it wants to use for its communication.
To ensure that the problem can be solved reliably, each agent receives the following information in its state: the communication channel used in the previous step, the own Quality of Service~(QoS) achieved by the last action, a vector of all neighbouring devices and the communication channels the neighbouring devices used in their last action.
After all agents have selected a communication channel, they receive their next state and a reward. We choose the reward to be the sum of the achieved QoS of all agents, since a shared reward avoids adversarial behavior and leads to cooperation between the agents.
We train this MARL task using a Q-Learning algorithm. We train our agents as well with a NashQ algorithm, which is adapted for Multi-Agent learning from Game Theory. The results show that the agents learn to communicate in a reliable way. However, the number of agents influences the training, since the number of possible state combinations increases exponentially with the number of agents. By comparison of the two different algorithms, the NashQ algorithm needs slightly less episodes to converge to an optimal policy than the Q-Learning algorithm.
Noisy intermediate-scale quantum (NISQ) computers promise a new paradigm for what is possible in information processing, with the ability to tackle complex and otherwise intractable computational challenges, by harnessing the massive intrinsic parallelism of qubits. Central to realising the potential of quantum computing are perfect entangling (PE) two-qubit gates, which serve as a critical building block for universal quantum computation. Quantum optimal control, which involves shaping electromagnetic pulses that drive quantum gates, aims at optimising the use of NISQ computers. In this work, Reinforcement Learning (RL) was leveraged to optimise for pulses which can generate PE gates. We train a range of RL agents within robust simulation environments to identify pulse shapes that push gate operation towards their theoretical limits. In particular, a trained RL agent can generate a 12 ns pulse, with a 50 ps sampling time, that yields near-maximal entanglement for the two-qubit gate, while maintaining unitarity error on the order of 10-4. Selected agents are then validated on higher-fidelity simulations, proving how RL-based methods can reduce overhead in hardware calibration and accelerate experimentation. Moreover, this approach is hardware agnostic, enabling broad applicability across different quantum platforms. Ultimately, this work shows how RL can be used to expedite optimisation and refine control processes in the presence of vast and intricate parameter spaces.
Beams at LCLS require precise shaping in position-momentum phase space to meet the needs of different users. In particular, the shape of the longitudinal phase space needs to be customized, while ensuring the transverse phase space meets the requirements for Free Electron Laser (FEL) lasing. We present results of using RL for longitudinal phase space shaping, and compare these with approaches using Bayesian optimization. We also present results on combining longitudinal phase space shaping with transverse phase space control.