State-Regularized Policy Search for Linearized Dynamical Systems

Abdulsamad, Hany; Arenz, Oleg; Peters, Jan; Neumann, Gerhard

Published in

Proceedings of the International Conference on Automated Planning and Scheduling, (27), p. 419-424, 2017

DOI: 10.1609/icaps.v27i1.13853

Tools

Export citation

Search in Google Scholar

State-Regularized Policy Search for Linearized Dynamical Systems

Journal article published in 2017 by Hany Abdulsamad, Oleg Arenz

, Jan Peters, Gerhard Neumann

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving forbidden

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of feedback-controllers by taking advantage of local approximations of model dynamics and cost functions. Stability of the policy update is a major issue for these methods, rendering them hard to apply for highly nonlinear systems. Recent approaches combine classical Stochastic Optimal Control methods with information-theoretic bounds to control the step-size of the policy update and could even be used to train nonlinear deep control policies. These methods bound the relative entropy between the new and the old policy to ensure a stable policy update. However, despite the bound in policy space, the state distributions of two consecutive policies can still differ significantly, rendering the used local approximate models invalid. To alleviate this issue we propose enforcing a relative entropy constraint not only on the policy update, but also on the update of the state distribution, around which the dynamics and cost are being approximated. We present a derivation of the closed-form policy update and show that our approach outperforms related methods on two nonlinear and highly dynamic simulated systems.

Published in

Links

Tools

State-Regularized Policy Search for Linearized Dynamical Systems

Abstract