Talk by Gergely Neu | Secondmind

Gergely Neu - A unified view of entropy-regularized Markov decision processes

Date:

May 21, 2020

Author:

Hrvoje Stojic

A unified view of entropy-regularized Markov decision processes

Abstract

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.

Notes

ArXiV preprint can be found here
Gergely Neu is a Research Assistant Professor, AI group, DTIC, Universitat Pompeu Fabra. His personal website can be found here.

Share on social media

Share on social media

Share on social media

Share on social media

Related Seminars

Mickael Binois - Leveraging replication in active learning

We were recently joined by Mickael Binois, to talk about 'Leveraging replication in active learning'.

Jun 24, 2024

Watch

Mickael Binois - Leveraging replication in active learning

We were recently joined by Mickael Binois, to talk about 'Leveraging replication in active learning'.

Jun 24, 2024

Watch

Mickael Binois - Leveraging replication in active learning

We were recently joined by Mickael Binois, to talk about 'Leveraging replication in active learning'.

Jun 24, 2024

Watch

Mickael Binois - Leveraging replication in active learning

We were recently joined by Mickael Binois, to talk about 'Leveraging replication in active learning'.

Jun 24, 2024

Watch

Ilija Bogunovic - From Data to Confident Decisions

We were recently joined by Ilija Bogunovic, to talk about 'Robust and Efficient Algorithmic Decision Making'.

Jun 13, 2024

Watch

Ilija Bogunovic - From Data to Confident Decisions

We were recently joined by Ilija Bogunovic, to talk about 'Robust and Efficient Algorithmic Decision Making'.

Jun 13, 2024

Watch

Ilija Bogunovic - From Data to Confident Decisions

We were recently joined by Ilija Bogunovic, to talk about 'Robust and Efficient Algorithmic Decision Making'.

Jun 13, 2024

Watch

Ilija Bogunovic - From Data to Confident Decisions

We were recently joined by Ilija Bogunovic, to talk about 'Robust and Efficient Algorithmic Decision Making'.

Jun 13, 2024

Watch

Dario Azzimonti - Preference learning with Gaussian processes

We were recently joined by Dario Azzimonti, to talk about 'Preference learning with Gaussian processes'.

May 23, 2024

Watch

Dario Azzimonti - Preference learning with Gaussian processes

We were recently joined by Dario Azzimonti, to talk about 'Preference learning with Gaussian processes'.

May 23, 2024

Watch

Dario Azzimonti - Preference learning with Gaussian processes

We were recently joined by Dario Azzimonti, to talk about 'Preference learning with Gaussian processes'.

May 23, 2024

Watch

Dario Azzimonti - Preference learning with Gaussian processes

We were recently joined by Dario Azzimonti, to talk about 'Preference learning with Gaussian processes'.

May 23, 2024

Watch

Mojmír Mutný - Optimal Experiment Design in Markov Chains

We were recently joined by Mojmír Mutný (ETH Zurich), to talk about 'Optimal Experiment Design in Markov Chains'.

Mar 28, 2024

Watch

Mojmír Mutný - Optimal Experiment Design in Markov Chains

We were recently joined by Mojmír Mutný (ETH Zurich), to talk about 'Optimal Experiment Design in Markov Chains'.

Mar 28, 2024

Watch

Mojmír Mutný - Optimal Experiment Design in Markov Chains

We were recently joined by Mojmír Mutný (ETH Zurich), to talk about 'Optimal Experiment Design in Markov Chains'.

Mar 28, 2024

Watch

Mojmír Mutný - Optimal Experiment Design in Markov Chains

We were recently joined by Mojmír Mutný (ETH Zurich), to talk about 'Optimal Experiment Design in Markov Chains'.

Mar 28, 2024

Watch

Products

Insights

Insights

Research

About Secondmind Labs

Research papers

Company

Get in touch

General enquiries

© Secondmind 2025

Products

Insights

Insights

Research

About Secondmind Labs

Research papers

Company

Get in touch

General enquiries

© Secondmind 2025

Products

Insights

Insights

Research

About Secondmind Labs

Research papers

Company

Get in touch

General enquiries

© Secondmind 2025

Products

Insights

Insights

Research

About Secondmind Labs

Research papers

Company

Get in touch

General enquiries

© Secondmind 2025