A generalized natural actor-critic algorithm

Tetsuro Morimurat, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Policy gradient Reinforcement Learning (RL) algorithms have received substantial attention, seeking stochastic policies that maximize the average (or discounted cumulative) reward. In addition, extensions based on the concept of the Natural Gradient (NG) show promising learning efficiency because these regard metrics for the task. Though there are two candidate metrics, Kakade's Fisher Information Matrix (FIM) for the policy (action) distribution and Morimura's FIM for the state-action joint distribution, but all RL algorithms with NG have followed Kakade's approach. In this paper, we describe a generalized Natural Gradient (gNG) that linearly interpolates the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, the generalized Natural Actor-Critic (gNAC) algorithm. The gNAC algorithm involves a near optimal auxiliary function to reduce the variance of the gNG estimates. Interestingly, the gNAC can be regarded as a natural extension of the current state-of-the-art NAC algorithm [1], as long as the interpolating parameter is appropriately selected. Numerical experiments showed that the proposed gNAC algorithm can estimate gNG efficiently and outperformed the NAC algorithm.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference
PublisherNeural Information Processing Systems
Pages1312-1320
Number of pages9
ISBN (Print)9781615679119
Publication statusPublished - 2009
Externally publishedYes
Event23rd Annual Conference on Neural Information Processing Systems, NIPS 2009 - Vancouver, BC, Canada
Duration: 07-12-200910-12-2009

Publication series

NameAdvances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference

Conference

Conference23rd Annual Conference on Neural Information Processing Systems, NIPS 2009
Country/TerritoryCanada
CityVancouver, BC
Period07-12-0910-12-09

All Science Journal Classification (ASJC) codes

  • Information Systems

Fingerprint

Dive into the research topics of 'A generalized natural actor-critic algorithm'. Together they form a unique fingerprint.

Cite this