On-line EM reinforcement learning

Junichiro Yoshimoto, Shin Ishii, Masa aki Sato

Research output: Contribution to conferencePaperpeer-review

4 Citations (Scopus)


In this article, we propose a new reinforcement learning (RL) method for a system having continuous state and action spaces. Our RL method has an architecture like the actor-critic model. The critic tries to approximate the Q-function, which is the expected future return for the current state-action pair. The actor tries to approximate a stochastic soft-max policy defined by the Q-function. The soft-max policy is more likely to select an action that has a higher Q-function value. The on-line EM algorithm is used to train the critic and the actor. We apply this method to two control problems. Computer simulations show that our method is able to acquire fairly good control in the two tasks after a few learning trials.

Original languageEnglish
Number of pages6
Publication statusPublished - 2000
Externally publishedYes
EventInternational Joint Conference on Neural Networks (IJCNN'2000) - Como, Italy
Duration: 24-07-200027-07-2000


ConferenceInternational Joint Conference on Neural Networks (IJCNN'2000)
CityComo, Italy

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence


Dive into the research topics of 'On-line EM reinforcement learning'. Together they form a unique fingerprint.

Cite this