A unifying view of optimism in episodic reinforcement learning

日付:

2021年9月2日

著者:

Hrvoje Stojic

Abstract

The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods.


Notes


  • An arXiv pre-print is available here.​​

  • Dr Ciara Pike-Burke is a Lecturer in Statistics at Imperial College London. Her website can be found here.

ソーシャルメディアで共有

ソーシャルメディアで共有

ソーシャルメディアで共有

関連するセミナー

Linear combinations of latents in generative models: subspaces and beyond

Erik Bodin - University of Cambridge

2025/03/13

Linear combinations of latents in generative models: subspaces and beyond

Erik Bodin - University of Cambridge

2025/03/13

Return of the latent space cowboys: rethinking the use of VAEs in Bayesian optimisation over structured spaces

Henry Moss - University of Cambridge, Lancaster University

2025/01/21

Return of the latent space cowboys: rethinking the use of VAEs in Bayesian optimisation over structured spaces

Henry Moss - University of Cambridge, Lancaster University

2025/01/21

Advancing sequential decision-making: efficient querying in clustering and best of both worlds for contextual bandits

Yuko Kuroki - CENTAI Institute

2024/10/10

Advancing sequential decision-making: efficient querying in clustering and best of both worlds for contextual bandits

Yuko Kuroki - CENTAI Institute

2024/10/10

AI in drug discovery - from model to process, from academic publication to decision-making

Andreas Bender - University of Cambridge

2024/09/19

AI in drug discovery - from model to process, from academic publication to decision-making

Andreas Bender - University of Cambridge

2024/09/19