- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
After the success of the 1st collaboration workshop last year RL4AA'23 and the founding of the Reinforcement Learning for Autonomous Accelerators community, the second collaboration workshop will take place in February 2024 in Salzburg. Please visit the workshop's website for more information: https://rl4aa.github.io/RL4AA24/
This talk covers the challenges and best practices for designing and running real-world reinforcement learning (RL) experiments.
The idea is to walk through the different steps of RL experimentation (task design, choosing the right algorithm, implementing safety layers) and also provide practical advice on how to run experiments and troubleshoot common problems.
Slides are also online: https://araffin.github.io/slides/design-real-rl-experiments/
Reinforcement Learning (RL) has demonstrated its effectiveness in solving control problems in particle accelerators. A challenging application is the control of the microbunching instability (MBI) in synchrotron light sources. Here the interaction of an electron bunch with its emitted coherent synchrotron radiation leads to complex non-linear dynamics and pronounced fluctuations.
Addressing the control of intricate dynamics necessitates meeting stringent microsecond-level real-time constraints. To achieve this, RL algorithms must be deployed on a high-performance electronics platform. The KINGFISHER system, utilizing the AMD-Xilinx Versal family of heterogeneous computing devices, has been specifically designed at KIT to tackle these demanding conditions. The system implements an experience accumulator architecture to perform online learning purely through interaction with the accelerator while still satisfying strong real-time constraints.
The preliminary results of this innovative control paradigm at the Karlsruhe Research Accelerator (KARA) will be presented. Notably, this represents the first experimental attempt to control the MBI with RL using online training only and running on hardware.
The complexity of the GSI/FAIR accelerator facility demands a high level of automation in order to maximize time for physics experiments. This talk will give an overview of different optimization problems at GSI, from transfer lines to synchrotrons to the fragment separator. Starting with a summary of previous successful automation, the talk will focus on the latest developments in recent months, such as the optimization of multi-turn injection in the SIS18 synchrotron. The introduction of a Python bridge to the settings management system LSA and the integration of GeOFF (Generic Optimization Framework & Frontend) enabled and facilitated beam-based optimization with numerical algorithms and machine learning. Geoff is an open-source framework that harmonizes access to a number automation techniques and simplifies the transition towards and between them.
DESY has many years on experience on optimization and control of particle accelerators. Reinforcement learning has been explored within the last three years. In this talk the results of this investigation are summarized and an outlook is given. Further control and optimization challenges for operation are presented and discussed.
In order to improve BESSY's experimental environment, several ML-based applications are used at HZB. These efforts cover challenges arising at the accelerator, beamlines and detectors at the experiment. This talk provides on overview of these activities focussing on RL and providing insights in the optimization of a beamline, tuning of an e-gun as well as electron beam positioning in BESSY's storage ring. The limitations of RL and the reason to also use user ML-techniques are also discussed and presented by various examples.
CERN has a long tradition of model-based feedforward control with a high-level of abstraction. With the recently approved project “Efficient Particle Accelerators”, the CERN management commits to go one step further and invest heavily into automation on all fronts. The initiative will therefore also further push data-driven surrogate models, sample-efficient optimisation and continous control algorithms into the current control system. Reinforcement Learning has been part of the CERN algorithm suite before many numerical optimisation algorithms. The many decades old CERN machines do however not easily provide for RL to be used - black-box optimisation algorithms are more easily integrated. This contribution will summarise RL controllers in the making for CERN and will mainly focus on CERN’s RL vision - offline RL, the importance of being able to deal with partially observable systems, and the necessity for continuously learning controllers.
The Quanser Aero2 system is an advanced laboratory experiment designed for exploring aerospace control systems, featuring two motor-driven fans on a pivot beam for precise control. Its capability to lock axes individually offers both single degree of freedom (DOF) and two DOF operation. The system’s non-linear characteristics and adaptability to multivariable configurations make it especially interesting for control theory research.
In this study, we use Reinforcement Learning (RL) to control the Aero2 system. To keep complexity low in a first step, this work focuses on the single DOF setup. A RL agent is trained on a simulation to develop a policy for orienting the beam to a specific tilt, using the target’s tilt deviation and velocity as the state space. To further reduce complexity, the second motor uses reversed polarity voltage from the first motor, resulting in a single action, enabling an in-depth analysis of the learning behaviour of the employed agents.
Even if we reduce the action space to one dimension by exploiting the symmetry of the two rotors, the given balancing task could not be solved with the default configuration of the used Proximal Policy Optimization (PPO) agent. We identified that a reduction of the number of units in each hidden fully connected layer of the agent networks is necessary to solve the task. However, detailed visualisations of the development of the policy over time revealed a transition from stable to volatile action choices in the long term, which is unexpected according to the current state of the literature. Future research will focus on the underlying causes of the observed volatility, giving insights into the dynamic nature of RL.
The success and fast pace of Machine Learning (ML) in the past decade was also
enabled by modern gradient descent optimizers embedded into ML frameworks such
as TensorFlow. In the context of a doctoral research project, we investigate how
these optimizers can be utilized directly, outside of the scope of neural
networks. This approach holds the potential of optimizing explainable models
with only few model parameters allowing to derive properties for direct,
physical explanation and interpretation, like velocity, acceleration or jerk.
This is highly beneficial for use in the field of mechatronics. However, while
modern gradient gradient descent optimizers shipped with ML frameworks perform
well in neural nets, results show that most optimizers have limited capabilities
when applied directly to PP models. Domain-specific model requirements like
C^k-continuity, acceleration or jerk limitation as well as spectral or energy
optimization pose the need for developing appropriate loss functions, novel
algorithms as well as regularization techniques in order to improve optimizer
performance.
In this context, we investigate piecewise polynomial models as they occur
(and are required) in 1D trajectory planning tasks in mechatronics. Utilizing
TensorFlow optimizers, we optimize our PP model towards multi-targeted loss
functions suitable for fitting of C^k-continuos PP functions which can be
deployed in an electronic cam approximation setting. We enhance capabilities of
our PP base model by utilizing an orthogonal Chebyshev basis along with a novel
regularization method improving convergence of the approximation and
continuity optimization targets. We see a possible application of this approach
in Deep Reinforcement Learning applied to Control Theory. By exchanging
the black box that is a neural network with an explainable PP model, we foster
utility of Reinforcement Learning in designing cyber-physical control systems.
Safety guarantees for Gaussian processes require the assumption that the true hyperparameters are known. However, this assumption usually does not hold in practice. In this talk, a method is introduced to overcome this issue which estimates confidence intervals of hyperparameters from their posterior distributions. Finally, it can be shown that via appropriate scaling safeness can be robustly guaranteed with high probability.
Reinforcement Learning (RL) is a rising subject of Machine Learning (ML). Especially Multi-Agent RL
(MARL), where more than one agent interacts with an environment by learning to solve a task, can model
many real-world problems. Unfortunately, the Multi-Agent case yields more difficulties in the already chal-
lenging field of Reinforcement Learning, like scalability issues, non-stationarity or non-unique learning goals.
To better understand these problems, we compare Single-Agent RL with MARL in the simple board
game Tic-Tac-Toe. This game is a two-player zero-sum game, meaning that two adversarial players compete
against each other during the game by setting their marks (x or o) on a 3x3 board. If one player has three
of his marks in one line (vertical, horizontal or diagonal), this player wins the game and ends it. If neither
of the players gets three marks in one line until all fields of the 3x3 boards are filled, the game ends with a
draw.
We study the learning of a Single- and a Multi-Agent system playing Tic-Tac-Toe, using a Q-Learning
algorithm that describes the learning of the agent in one formula. As typical in RL, the agent interacts with
an environment during learning, which is, in this case, the 3x3 board of the Tic-Tac-Toe game. The two
playing agents set their marks one after another. At the end of each game, every agent gets a reward based on
the game’s outcome. During learning, the agent tries to maximize his reward, which leads to a well-playing
strategy, namely the policy.
We show that a Single-Agent RL agent only performs as well as the opponent against whom he is trained,
while the agents in the MARL scenario learn an optimal strategy against every possible opponent. Addition-
ally, the agents in the MARL learn more quickly than the ones in the Single-Agent case.
We will use these results to set up a MARL setting in network communications. In this scenario, all
communicating electronic devices are different agents, that should communicate in a reliable way, using as
many resources as possible for a quick communication without disturbing the communication of the other
devices
In the tutorial, we will look at meta reinforcement learning and model-based RL techniques for the AWAKE trajectory tuning task.
https://github.com/RL4AA/rl4aa24-tutorial
In the tutorial, we will look at meta reinforcement learning and model-based RL techniques for the AWAKE trajectory tuning task.
https://github.com/RL4AA/rl4aa24-tutorial
Synchrotron light source storage rings aim to maintain a continuous beam current without observable beam motion during injection. One element that paves the way to this target is the non-linear kicker (NLK). The field distribution it generates poses challenges for optimising the topping-up operation.
Within this study, a reinforcement learning agent was developed and trained to optimise the NLK operation parameters. We present the models employed, the optimisation process, and the achieved results.
The Sonobot Unmanned Surface Vehicle (USV), developed by EvoLogics, is a system platform tailored for hydrographic surveying in inland waters. Despite its integrated GPS and autopilot system for autonomous mission execution, the Sonobot lacks a collision avoidance system, necessitating constant operator monitoring and significantly limiting its autonomy.
Recognizing the untapped potential of USVs for integrating advancements in autonomous vehicles, machine learning, and control theory, we propose a two-layered system: a perception layer for object detection and an algorithmic layer for collision-free path selection. The novelty of our perception layer lies in the integration of a Stereo Camera, LiDAR, and Front Looking sonar for robust obstacle detection.
For the algorithmic layer, we engineered a simple yet powerful cost function. Our preliminary results demonstrate the ability to calculate a collision-free trajectory for the Sonobot using this cost function in conjunction with a Model Predictive Controller (MPC).
We invite discussion on the potential of testing the MPC against Reinforcement Learning and the possibility of combining MPC and RL to further enhance the autonomy and efficiency of USVs.
As a critical radiological facility, the International Fusion Materials Irradiation Facility - DEMO Oriented Neutron Source (IFMIF-DONES) will implement effective measures to ensure the safety of its personnel and the environment. To enable the proper implementation of these measures, the ISO 17873 standard has been adopted throughout the design process of the facility. The proposed dynamic confinement measures outlined in this standard require a thorough design of the nuclear Heating, Ventilation and Air Conditioning (HVAC) system to ensure effective containment barriers, stable pressure levels and proper treatment of effluents. However, the design and control of such a critical system presents several challenges, as numerous factors influence pressure stability within the facility.
Despite these challenges, recent advances in Deep Reinforcement Learning (DRL) algorithms have demonstrated their effectiveness in solving complex continuous control problems in a variety of domains. In this work, we evaluate the performance of DRL algorithms in controlling the nuclear HVAC system of IFMIF-DONES. For this purpose, we use a MELCOR simulation model of the particle accelerator facility as a training environment and adapt the functionalities of this simulator to enable the continuous control of the air inlet flow rates.
RadiaSoft is developing machine learning methods to improve the operation and control of industrial accelerators. Because industrial systems typically suffer from a lack of instrumentation and a noisier environment, advancements in control methods are critical for optimizing their performance. In particular, our recent work has focused on the development of pulse-to-pulse feedback algorithms for use in dose optimization for FLASH radiotherapy. The PHASER (pluridirectional high-energy agile scanning electronic radiotherapy) system is of particular interest due to the need to synchronize 16 different accelerators all with their own noise characteristics. This presentation will provide an overview of the challenges associated with dose optimization for a PHASER-like system, a description of the toy model used to evaluate different control schema, and our initial results using RL for dose delivery optimization.
Despite the spreading of Reinforcement Learning (RL) applications for optimizing the performance of particle accelerators, this approach is not always the best choice. Indeed, not all problems are suitable to be solved via RL. Before diving into such techniques, a good knowledge of the problem, the available resources, and the existing solutions is recommended. An example of the complexities related to RL solutions is the automatic setup of controlled longitudinal emittance blow-up in the CERN SPS. Several criticalities, such as the data availability and the increasing problem dimensions, limited the development of an operational tool based on RL. Therefore, the released software relies on generic optimizers only, even if promising results with Bayesian optimization were achieved.
Reinforcement Learning (RL) has been successfully applied to a wide range of problems. When the environment to control does not exhibit stringent real-time constraints, currently available techniques and computational infrastructures are sufficient. At particle accelerators, however, it is often possible to encounter stringent requirements on the time necessary for an action to be chosen, that in some extreme cases can fall in the microsecond scale.
These challenging conditions also present some benefits. For instance, the data throughput of the real-world environment can be orders of magnitude greater compared to a simulation. This opens the possibility of online training without the issues linked to transferring a simulation-trained agent to the real world.
In this contribution, real-time constraints and how they affect RL algorithms will be introduced, followed by a description of FPGAs and heterogeneous hardware platforms. This is then used to motivate the architecture of the state-of-the-art KINGFISHER RL system. Finally, an in-depth discussion of the use-cases where this approach can be beneficial will be provided, together with basic guidelines for structuring RL problems in a more hardware-friendly way.
Reinforcement learning (RL), a subgroup of machine learning, has gained recognition for its astonishing success in complex games, however it has yet to show similar success in more real-world scenarios. In principle, the ability for RL to generalise past experience, act in real time, and its resilience to new states makes it particularly attractive as a robust decision-making support for real-world scenarios. However, such scenarios bring unique challenges that aren't present in the game-like domains, such as complex and contradictory reward functions and a necessity for explainability. In this presentation we will discuss some of these challenges in the context of using RL for automotive powertrain control. We will discuss the problem setup, including reward definition, as well as one approach to explainability. This approach is to first learn a neural network based policy (which can learn effectively and efficiently) and then extrace a rule-based policy (which is easier to interpret and can be directly implemented in current control software). The results are benchmarked with an optimised MATLAB policy, using a simulink simulation.
Free energy-based reinforcement learning (FERL) using clamped quantum Boltzmann machines (QBM) has demonstrated remarkable improvements in learning efficiency, surpassing classical Q-learning algorithms by orders of magnitude. This work extends the FERL approach to multi-dimensional optimisation problems and eliminates the restriction to discrete action-space environments, opening doors for a broader range of real-world applications. We will discuss the results obtained with quantum annealing, employing both a simulator and D-Wave quantum annealing hardware, as well as a comparison to classical RL methods. We will cover how the algorithms are evaluated for control problems at CERN, such as the AWAKE electron beam line, and for classical RL benchmarks of varying degree of complexity.