Sampling Policy that Guarantees Reliability of Optimal Policy in Reinforcement Learning

Sampling Policy that Guarantees Reliability of Optimal Policy in Reinforcement Learning

This study defines the certification sampling that guarantees with specified reliability the optimal policy being correct to the real transition probability, where the optimal policy was derived from a estimated probability. It then discusses the sampling policy as follows that efficiently obtains the certification sampling. The the transition probability is estimated by sampling, and it leads the optimal policy. On the other hand, it calculates the desired accuracy of the estimated transition probability that is necessary to guarantee the correct optimal policy. This study proposes the sampling policy that efficiently achieves the certification sampling with the desired accuracy of the estimated transition probability. The proposed method is efficient in number of samples because it automatically selects states and actions to be sampled and stops sampling when the condition is satisfied.

Publication:

Transactions of the Society of Instrument and Control Engineers

Pub Date:

2011

Bibcode:

2011TSICE..46..274S

Keywords:

sampling policy;
reliability;
accuracy of transition probability;
reinforcement learning

NASA/ADS

Sampling Policy that Guarantees Reliability of Optimal Policy in Reinforcement Learning

Abstract