TY - GEN

T1 - A generalized natural actor-critic algorithm

AU - Morimurat, Tetsuro

AU - Uchibe, Eiji

AU - Yoshimoto, Junichiro

AU - Doya, Kenji

PY - 2009

Y1 - 2009

N2 - Policy gradient Reinforcement Learning (RL) algorithms have received substantial attention, seeking stochastic policies that maximize the average (or discounted cumulative) reward. In addition, extensions based on the concept of the Natural Gradient (NG) show promising learning efficiency because these regard metrics for the task. Though there are two candidate metrics, Kakade's Fisher Information Matrix (FIM) for the policy (action) distribution and Morimura's FIM for the state-action joint distribution, but all RL algorithms with NG have followed Kakade's approach. In this paper, we describe a generalized Natural Gradient (gNG) that linearly interpolates the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, the generalized Natural Actor-Critic (gNAC) algorithm. The gNAC algorithm involves a near optimal auxiliary function to reduce the variance of the gNG estimates. Interestingly, the gNAC can be regarded as a natural extension of the current state-of-the-art NAC algorithm [1], as long as the interpolating parameter is appropriately selected. Numerical experiments showed that the proposed gNAC algorithm can estimate gNG efficiently and outperformed the NAC algorithm.

AB - Policy gradient Reinforcement Learning (RL) algorithms have received substantial attention, seeking stochastic policies that maximize the average (or discounted cumulative) reward. In addition, extensions based on the concept of the Natural Gradient (NG) show promising learning efficiency because these regard metrics for the task. Though there are two candidate metrics, Kakade's Fisher Information Matrix (FIM) for the policy (action) distribution and Morimura's FIM for the state-action joint distribution, but all RL algorithms with NG have followed Kakade's approach. In this paper, we describe a generalized Natural Gradient (gNG) that linearly interpolates the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, the generalized Natural Actor-Critic (gNAC) algorithm. The gNAC algorithm involves a near optimal auxiliary function to reduce the variance of the gNG estimates. Interestingly, the gNAC can be regarded as a natural extension of the current state-of-the-art NAC algorithm [1], as long as the interpolating parameter is appropriately selected. Numerical experiments showed that the proposed gNAC algorithm can estimate gNG efficiently and outperformed the NAC algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84858717872&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858717872&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84858717872

SN - 9781615679119

T3 - Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference

SP - 1312

EP - 1320

BT - Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference

PB - Neural Information Processing Systems

T2 - 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009

Y2 - 7 December 2009 through 10 December 2009

ER -