TY - GEN
T1 - A reinforcement learning approach to the shepherding task using SARSA
AU - Go, Clark Kendrick
AU - Lao, Bryan
AU - Yoshimoto, Junichiro
AU - Ikeda, Kazushi
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/10/31
Y1 - 2016/10/31
N2 - In this paper, we present a reinforcement learning model of the shepherding of a flock of sheep by a dog. The shepherding task, a heuristic model originally proposed by Strombom, et al., describes the dynamics of the sheep while being herded by a dog to a predefined target. This study recreates the proposed model using SARSA, an algorithm for learning the optimal policy in reinforcement learning. Results show that with a discretized state and action space, the dog is able to successfully herd a flock of a sheep to the target position by first learning to reach a subgoal. A reward is awarded when the dog reaches the neighbourhood of a subgoal, while a penalty is incurred for each time the shepherding task is not completed. The stochasticity of the interaction among sheep and dog, including the existence of multiple subgoals affect the learning time of the agent. Finally, we present an example of the learned shepherding task which shows the agent's continuous success after the 350th episode.
AB - In this paper, we present a reinforcement learning model of the shepherding of a flock of sheep by a dog. The shepherding task, a heuristic model originally proposed by Strombom, et al., describes the dynamics of the sheep while being herded by a dog to a predefined target. This study recreates the proposed model using SARSA, an algorithm for learning the optimal policy in reinforcement learning. Results show that with a discretized state and action space, the dog is able to successfully herd a flock of a sheep to the target position by first learning to reach a subgoal. A reward is awarded when the dog reaches the neighbourhood of a subgoal, while a penalty is incurred for each time the shepherding task is not completed. The stochasticity of the interaction among sheep and dog, including the existence of multiple subgoals affect the learning time of the agent. Finally, we present an example of the learned shepherding task which shows the agent's continuous success after the 350th episode.
UR - https://www.scopus.com/pages/publications/85007227400
UR - https://www.scopus.com/inward/citedby.url?scp=85007227400&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2016.7727694
DO - 10.1109/IJCNN.2016.7727694
M3 - Conference contribution
AN - SCOPUS:85007227400
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 3833
EP - 3836
BT - 2016 International Joint Conference on Neural Networks, IJCNN 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 International Joint Conference on Neural Networks, IJCNN 2016
Y2 - 24 July 2016 through 29 July 2016
ER -