Sampling Policy that Guarantees Reliability of Optimal Policy in Reinforcement Learning
Abstract
This study defines the certification sampling that guarantees with specified reliability the optimal policy being correct to the real transition probability, where the optimal policy was derived from a estimated probability. It then discusses the sampling policy as follows that efficiently obtains the certification sampling. The the transition probability is estimated by sampling, and it leads the optimal policy. On the other hand, it calculates the desired accuracy of the estimated transition probability that is necessary to guarantee the correct optimal policy. This study proposes the sampling policy that efficiently achieves the certification sampling with the desired accuracy of the estimated transition probability. The proposed method is efficient in number of samples because it automatically selects states and actions to be sampled and stops sampling when the condition is satisfied.
- Publication:
-
Transactions of the Society of Instrument and Control Engineers
- Pub Date:
- 2011
- Bibcode:
- 2011TSICE..46..274S
- Keywords:
-
- sampling policy;
- reliability;
- accuracy of transition probability;
- reinforcement learning