DPG Phi
Verhandlungen
Verhandlungen
DPG

Dresden 2020 – wissenschaftliches Programm

Die DPG-Frühjahrstagung in Dresden musste abgesagt werden! Lesen Sie mehr ...

Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe

SOE: Fachverband Physik sozio-ökonomischer Systeme

SOE 16: Evolutionary Game Theory and Networks (joint SOE/DY/BP)

SOE 16.3: Vortrag

Donnerstag, 19. März 2020, 15:30–15:45, GÖR 226

Reinforcement learning dynamics in the infinite memory limit — •Wolfram Barfuss — Max Planck Institute for Mathematics in the Sciences, Leipzig

Reinforcement learning algorithms have been shown to converge to the classic replicator dynamics of evolutionary game theory, which describe the evolutionary process in the limit of an infinite population. However, it is not clear how to interpret these dynamics from the perspective of a learning agent. In this work we propose a data-inefficient batch-learning algorithm for temporal difference Q learning and show that it converges to a recently proposed deterministic limit of temporal difference reinforcement learning. In a second step, we state a data-efficient learning algorithm, that uses a form of experience replay, and show that it retains core features of the batch learning algorithm. Thus, we propose an agent-interpretation for the learning dynamics: What is the infinite population limit of evolutionary dynamics is the infinite memory limit of learning dynamics.

100% | Mobil-Ansicht | English Version | Kontakt/Impressum/Datenschutz
DPG-Physik > DPG-Verhandlungen > 2020 > Dresden