TY - JOUR
T1 - Control of exploitation-exploration meta-parameter in reinforcement learning
AU - Ishii, Shin
AU - Yoshida, Wako
AU - Yoshimoto, Junichiro
PY - 2002
Y1 - 2002
N2 - In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain.
AB - In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain.
UR - http://www.scopus.com/inward/record.url?scp=0036592028&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036592028&partnerID=8YFLogxK
U2 - 10.1016/S0893-6080(02)00056-4
DO - 10.1016/S0893-6080(02)00056-4
M3 - Article
C2 - 12371519
AN - SCOPUS:0036592028
SN - 0893-6080
VL - 15
SP - 665
EP - 687
JO - Neural Networks
JF - Neural Networks
IS - 4-6
ER -