Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

doi:10.48550/arXiv.2106.01516

Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

Kobayashi, Taisuke

This paper proposes a new reinforcement learning with hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t. reward and punishment are different from each other, like a sign effect in animal behaviors.

Publication:

arXiv e-prints

Pub Date:

June 2021

DOI:

10.48550/arXiv.2106.01516

arXiv:

arXiv:2106.01516

Bibcode:

2021arXiv210601516K

Keywords:

Computer Science - Machine Learning

E-Print:

2 pages, 1 figure, presented as Paper Abstracts in ICDL-EPIROB2019

NASA/ADS

Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

Abstract