Dresden 2020 – scientific programme
The DPG Spring Meeting in Dresden had to be cancelled! Read more ...
Parts | Days | Selection | Search | Updates | Downloads | Help
SOE: Fachverband Physik sozio-ökonomischer Systeme
SOE 16: Evolutionary Game Theory and Networks (joint SOE/DY/BP)
SOE 16.3: Talk
Thursday, March 19, 2020, 15:30–15:45, GÖR 226
Reinforcement learning dynamics in the infinite memory limit — •Wolfram Barfuss — Max Planck Institute for Mathematics in the Sciences, Leipzig
Reinforcement learning algorithms have been shown to converge to the classic replicator dynamics of evolutionary game theory, which describe the evolutionary process in the limit of an infinite population. However, it is not clear how to interpret these dynamics from the perspective of a learning agent. In this work we propose a data-inefficient batch-learning algorithm for temporal difference Q learning and show that it converges to a recently proposed deterministic limit of temporal difference reinforcement learning. In a second step, we state a data-efficient learning algorithm, that uses a form of experience replay, and show that it retains core features of the batch learning algorithm. Thus, we propose an agent-interpretation for the learning dynamics: What is the infinite population limit of evolutionary dynamics is the infinite memory limit of learning dynamics.