Combinatorial Causal Bandits
Abstract
In combinatorial causal bandits (CCB), the learning agent chooses at most $K$ variables in each round to intervene, collects feedback from the observed variables, with the goal of minimizing expected regret on the target variable $Y$. Different from all prior studies on causal bandits, CCB needs to deal with exponentially large action space. We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models. We present the algorithm BGLMOFU for Markovian BGLMs (i.e. no hidden variables) based on the maximum likelihood estimation method, and show that it achieves $O(\sqrt{T}\log T)$ regret, where $T$ is the time horizon. For the special case of linear models with hidden variables, we apply causal inference techniques such as the docalculus to convert the original model into a Markovian model, and then show that our BGLMOFU algorithm and another algorithm based on the linear regression both solve such linear models with hidden variables. Our novelty includes (a) considering the combinatorial intervention action space and the general causal models including ones with hidden variables, (b) integrating and adapting techniques from diverse studies such as generalized linear bandits and online influence maximization, and (c) not relying on unrealistic assumptions such as knowing the joint distribution of the parents of $Y$ under all interventions used in some prior studies.
 Publication:

arXiv eprints
 Pub Date:
 June 2022
 arXiv:
 arXiv:2206.01995
 Bibcode:
 2022arXiv220601995F
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Social and Information Networks;
 Statistics  Methodology;
 Statistics  Machine Learning
 EPrint:
 28 pages, 9 figures