Seminar: Gabriel Dulac-Arnold - Google Research

Date October 1, 2020
Author Hrvoje Stojic

Challenges of Real-world RL: Definition, Implementation, Analysis

Abstract

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real-world problems. Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a system’s operation, but no direct access to the system when learning a policy. Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data. Model-free policies tend to be more performant, but are more opaque, harder to command externally, and less easy to integrate into larger systems. We propose an offline learner that generates a model that can be used to control the system directly through planning. This allows us to have easily controllable policies directly from data, without ever interacting with the system. We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability to leverage planning to respect environmental constraints. We are able to find near-optimal policies for certain simulated systems from as little as 50 seconds of real-time system interaction, and create zero-hot goal-conditioned policies on a series of environments.

Notes

  • An arXiv pre-print is available here.​​
  • Dr Gabriel Dulac-Arnold is a researcher at Google Research. His publication record on Google Scholar can be found here, personal website here , and on Twitter you can find him at @gabepsilon​.
Share
,,

Related articles

We received best paper honours at ICML 2019

Seminar: Arno Solin - Aalto University

Seminar: Matthew E. Taylor - University of Alberta

Seminar: Siddharth Reddy - University of California, Berkeley

Optimization Engine
    Learn more
Solutions
Insights
Company
Research
©2024 Secondmind Ltd.