Talk by Gabriel Dulac-Arnold

Products

リソース

リサーチ

会社について

Select Language

Japanese (Japan)

連絡を取る

システム設計

痛みを伴う、コンセクテュールアドピシングエリート。エティアムサピエンエリート、コンセクアトエゲット。

About

ユースケース

製品詳細

ウェビナー

適合

痛みを伴う、コンセクテュールアドピシングエリート。エティアムサピエンエリート、コンセクアトエゲット。

About

ユースケース

製品詳細

ウェビナー

Products

リソース

リサーチ

会社について

Select Language

Japanese (Japan)

システム設計

痛みを伴う、コンセクテュールアドピシングエリート。エティアムサピエンエリート、コンセクアトエゲット。

About

ユースケース

製品詳細

ウェビナー

適合

痛みを伴う、コンセクテュールアドピシングエリート。エティアムサピエンエリート、コンセクアトエゲット。

About

ユースケース

製品詳細

ウェビナー

Challenges of real-world RL: definition, implementation, analysis

日付:

2020年10月1日

著者:

Hrvoje Stojic

Abstract

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real-world problems. Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a system’s operation, but no direct access to the system when learning a policy. Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data. Model-free policies tend to be more performant, but are more opaque, harder to command externally, and less easy to integrate into larger systems. We propose an offline learner that generates a model that can be used to control the system directly through planning. This allows us to have easily controllable policies directly from data, without ever interacting with the system. We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability to leverage planning to respect environmental constraints. We are able to find near-optimal policies for certain simulated systems from as little as 50 seconds of real-time system interaction, and create zero-hot goal-conditioned policies on a series of environments.

Notes

An arXiv pre-print is available here.
Dr Gabriel Dulac-Arnold is a researcher at Google Research. His publication record on Google Scholar can be found here, personal website here, and on Twitter you can find him at @gabepsilon.

ソーシャルメディアで共有