Die DPG-Frühjahrstagung in Dresden musste abgesagt werden! Lesen Sie mehr ...
Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe
SOE: Fachverband Physik sozio-ökonomischer Systeme
SOE 16: Evolutionary Game Theory and Networks (joint SOE/DY/BP)
SOE 16.3: Vortrag
Donnerstag, 19. März 2020, 15:30–15:45, GÖR 226
Reinforcement learning dynamics in the infinite memory limit — •Wolfram Barfuss — Max Planck Institute for Mathematics in the Sciences, Leipzig
Reinforcement learning algorithms have been shown to converge to the classic replicator dynamics of evolutionary game theory, which describe the evolutionary process in the limit of an infinite population. However, it is not clear how to interpret these dynamics from the perspective of a learning agent. In this work we propose a data-inefficient batch-learning algorithm for temporal difference Q learning and show that it converges to a recently proposed deterministic limit of temporal difference reinforcement learning. In a second step, we state a data-efficient learning algorithm, that uses a form of experience replay, and show that it retains core features of the batch learning algorithm. Thus, we propose an agent-interpretation for the learning dynamics: What is the infinite population limit of evolutionary dynamics is the infinite memory limit of learning dynamics.