Feb 5 – 7, 2024
Universität Salzburg (Paris-Lodron-Universität)
Europe/Berlin timezone
Registration and call for abstracts extended to 5 January

The Geometry of Reinforcement Learning: Insights from the Dual Linear Program

Not scheduled
10m
Blue lecture hall (Universität Salzburg (Paris-Lodron-Universität))

Blue lecture hall

Universität Salzburg (Paris-Lodron-Universität)

Hellbrunnerstrasse 34 5020 Salzburg
Student Talk Student Session

Speaker

Nikola Milosevic (Max Planck Institute for Human Cognitive and Brain Sciences)

Description

Reinforcement Learning (RL) has become a cornerstone of machine learning, showcasing remarkable success in addressing real-world control problems and providing insights into cognitive processes in the brain. However, navigating the intricacies of modern RL proves challenging due to its numerous moving parts, escalating agent complexity, and the application of deep learning in a non-i.i.d. setting. The inherent challenge of intuitively reasoning about RL stems, in part, from its time-dependent and recursive nature. During this presentation, we explore the dual linear program and the intuitions it can offer. What traditionally serves as a theoretical construct for proving theorems emerges as a valuable tool for developing intuitions and facilitating the exploration of higher-level questions. We will focus on two practical demonstrations that underscore the significance of this perspective: 1) designing policy optimization algorithms and 2) pretraining RL agents. During the first half of this presentation, I will review the dual linear program and its geometry, aiming to uncover novel policy optimization strategies. In the second part, I will provide a preview of how the linear program can be generalized to convex MDPs, resulting in pretraining objectives similar to representation learning with the Variational Autoencoder.

Possible contributed talk Yes
Are you a student? Yes

Primary authors

Nikola Milosevic (Max Planck Institute for Human Cognitive and Brain Sciences) Johannes Müller (RWTH Aachen)

Co-authors

Dr Nico Scherf (MPI CBS) Semih Cayçı (RWTH Aachen)

Presentation materials

There are no materials yet.