Regensburg 2025 – scientific programme
Parts | Days | Selection | Search | Updates | Downloads | Help
AKPIK: Arbeitskreis Physik, moderne Informationstechnologie und Künstliche Intelligenz
AKPIK 5: Poster
AKPIK 5.8: Poster
Thursday, March 20, 2025, 15:00–16:30, P2
Deterministic Model of Multi-Agent Boltzmann Q-Learning: Transient Dynamics, Feedback Loops, and Non-Convergence — •David Goll1, Jobst Heitzig2, and Wolfram Barfuss3 — 1Humboldt University of Berlin — 2Potsdam Institute of Climate Impact Research — 3University of Bonn
Multi-Agent Reinforcement Learning involves interacting agents whose learning processes are indirectly coupled through their shared environment, giving rise to emergent, collective dynamics that are sensitive to initial conditions and parameter variations. A Complex Systems approach, which examines dynamic interactions in multi-component systems, can uncover the underlying dynamics by constructing deterministic, approximate models of stochastic algorithms. In this work, we show that even in the simplest case of independent Q-learning with a Boltzmann exploration policy, previous models fail to capture actual learning behaviour. Specifically, the dynamics of the Q-space---representing agents' state-action value estimates---cannot be directly reduced to the lower-dimensional policy space representing their strategies, as assumed in earlier models. By explicitly incorporating agents' update frequencies, we propose a new discrete-time model that captures the observed behaviours and uncovers a fundamentally more complex dynamical landscape. We demonstrate the utility of this approach by applying it to the Prisoner's Dilemma, where our model distinguishes transient states, which might be mistaken for equilibria, from true equilibria. Furthermore, we show that varying hyperparameters, such as the discount factor, can prevent convergence to a joint policy.
Keywords: Reinforcement Learning; Nonlinear Dynamics; Complex Systems