Free-energy-based reinforcement learning in a partially observable environment

Makoto Otsuka, Junichiro Yoshimoto, Kenji Doya

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Free-energy-based reinforcement learning (FERL) can handle Markov decision processes (MDPs) with high-dimensional state spaces by approximating the state-action value function with the negative equilibrium free energy of a restricted Boltzmann machine (RBM). In this study, we extend the FERL framework to handle partially observable MDPs (POMDPs) by incorporating a recurrent neural network that learns a memory representation sufficient for predicting future observations and rewards. We demonstrate that the proposed method successfully solves POMDPs with high-dimensional observations without any prior knowledge of the environmental hidden states and dynamics. After learning, task structures are implicitly represented in the distributed activation patterns of hidden nodes of the RBM.

Original languageEnglish
Title of host publicationProceedings of the 18th European Symposium on Artificial Neural Networks - Computational Intelligence and Machine Learning, ESANN 2010
Pages541-546
Number of pages6
Publication statusPublished - 2010
Externally publishedYes
Event18th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2010 - Bruges, Belgium
Duration: 28-04-201030-04-2010

Publication series

NameProceedings of the 18th European Symposium on Artificial Neural Networks - Computational Intelligence and Machine Learning, ESANN 2010

Conference

Conference18th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2010
Country/TerritoryBelgium
CityBruges
Period28-04-1030-04-10

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Information Systems

Fingerprint

Dive into the research topics of 'Free-energy-based reinforcement learning in a partially observable environment'. Together they form a unique fingerprint.

Cite this