Epsilon-Greedy Thompson Sampling to Bayesian Optimization

doi:10.48550/arXiv.2403.00540

Epsilon-Greedy Thompson Sampling to Bayesian Optimization

Bayesian optimization (BO) has become a powerful tool for solving simulation-based engineering optimization problems thanks to its ability to integrate physical and mathematical understandings, consider uncertainty, and address the exploitation--exploration dilemma. Thompson sampling (TS) is a preferred solution for BO to handle the exploitation--exploration trade-off. While it prioritizes exploration by generating and minimizing random sample paths from probabilistic models -- a fundamental ingredient of BO -- TS weakly manages exploitation by gathering information about the true objective function after it obtains new observations. In this work, we improve the exploitation of TS by incorporating the $\varepsilon$-greedy policy, a well-established selection strategy in reinforcement learning. We first delineate two extremes of TS, namely the generic TS and the sample-average TS. The former promotes exploration, while the latter favors exploitation. We then adopt the $\varepsilon$-greedy policy to randomly switch between these two extremes. Small and large values of $\varepsilon$ govern exploitation and exploration, respectively. By minimizing two benchmark functions and solving an inverse problem of a steel cantilever beam,we empirically show that $\varepsilon$-greedy TS equipped with an appropriate $\varepsilon$ is more robust than its two extremes,matching or outperforming the better of the generic TS and the sample-average TS.

Publication:

arXiv e-prints

Pub Date:

March 2024

DOI:

10.48550/arXiv.2403.00540

arXiv:

arXiv:2403.00540

Bibcode:

2024arXiv240300540D

Keywords:

Computer Science - Machine Learning;
Mathematics - Optimization and Control;
Statistics - Machine Learning

NASA/ADS

Epsilon-Greedy Thompson Sampling to Bayesian Optimization

Abstract