Feb 5 – 7, 2024
Universität Salzburg (Paris-Lodron-Universität)
Europe/Berlin timezone
Registration and call for abstracts extended to 5 January

Comparing Q-Learning of Single-Agent and Multi-Agent Reinforcement Learning

Feb 6, 2024, 10:10 AM
20m
Blue lecture hall (Universität Salzburg (Paris-Lodron-Universität))

Blue lecture hall

Universität Salzburg (Paris-Lodron-Universität)

Hellbrunnerstrasse 34 5020 Salzburg
Student Talk Student Session

Speaker

Sabrina Pochaba

Description

Reinforcement Learning (RL) is a rising subject of Machine Learning (ML). Especially Multi-Agent RL
(MARL), where more than one agent interacts with an environment by learning to solve a task, can model
many real-world problems. Unfortunately, the Multi-Agent case yields more difficulties in the already chal-
lenging field of Reinforcement Learning, like scalability issues, non-stationarity or non-unique learning goals.
To better understand these problems, we compare Single-Agent RL with MARL in the simple board
game Tic-Tac-Toe. This game is a two-player zero-sum game, meaning that two adversarial players compete
against each other during the game by setting their marks (x or o) on a 3x3 board. If one player has three
of his marks in one line (vertical, horizontal or diagonal), this player wins the game and ends it. If neither
of the players gets three marks in one line until all fields of the 3x3 boards are filled, the game ends with a
draw.
We study the learning of a Single- and a Multi-Agent system playing Tic-Tac-Toe, using a Q-Learning
algorithm that describes the learning of the agent in one formula. As typical in RL, the agent interacts with
an environment during learning, which is, in this case, the 3x3 board of the Tic-Tac-Toe game. The two
playing agents set their marks one after another. At the end of each game, every agent gets a reward based on
the game’s outcome. During learning, the agent tries to maximize his reward, which leads to a well-playing
strategy, namely the policy.
We show that a Single-Agent RL agent only performs as well as the opponent against whom he is trained,
while the agents in the MARL scenario learn an optimal strategy against every possible opponent. Addition-
ally, the agents in the MARL learn more quickly than the ones in the Single-Agent case.
We will use these results to set up a MARL setting in network communications. In this scenario, all
communicating electronic devices are different agents, that should communicate in a reliable way, using as
many resources as possible for a quick communication without disturbing the communication of the other
devices

Possible contributed talk Yes
Are you a student? Yes

Primary authors

Presentation materials