Speaker
Description
The Quanser Aero2 system is an advanced laboratory experiment designed for exploring aerospace control systems, featuring two motor-driven fans on a pivot beam for precise control. Its capability to lock axes individually offers both single degree of freedom (DOF) and two DOF operation. The system’s non-linear characteristics and adaptability to multivariable configurations make it especially interesting for control theory research.
In this study, we use Reinforcement Learning (RL) to control the Aero2 system. To keep complexity low in a first step, this work focuses on the single DOF setup. A RL agent is trained on a simulation to develop a policy for orienting the beam to a specific tilt, using the target’s tilt deviation and velocity as the state space. To further reduce complexity, the second motor uses reversed polarity voltage from the first motor, resulting in a single action, enabling an in-depth analysis of the learning behaviour of the employed agents.
Even if we reduce the action space to one dimension by exploiting the symmetry of the two rotors, the given balancing task could not be solved with the default configuration of the used Proximal Policy Optimization (PPO) agent. We identified that a reduction of the number of units in each hidden fully connected layer of the agent networks is necessary to solve the task. However, detailed visualisations of the development of the policy over time revealed a transition from stable to volatile action choices in the long term, which is unexpected according to the current state of the literature. Future research will focus on the underlying causes of the observed volatility, giving insights into the dynamic nature of RL.
Possible contributed talk | No |
---|---|
Are you a student? | Yes |