Paper ID: | 1286 |
---|---|

Title: | Structural Causal Bandits: Where to Intervene? |

The authors formulated a multiarmed bandit problem over a given causal structure. The arms of the bandits are interventions of the observed variables (excl. reward) to certain specific values. The causal structure is assumed to be known, but parameters are not. The authors show that only arms corresponding to possibly optimal minimal intervention set are worth considering. Such sets can be efficiently enumerated by their algorithm. When previous algorithms use these sets, they make optimal decision earlier in different MAB scenarious. The paper is written very clearly, I did not check the details of the proofs but the theory seems valid. The presented problem seems a sensible one, although I am not sure how realistic knowing the causal structure in this scenario is. It certainly builds on previous (causal) MABs. The restriction to POMIS seems valid theoretical contribution. On the whole the paper presents an interesting and worthwhile contribution, further research along these lines is to be expected. The important example on l 46 seems to require as specific parametrization: please add a reference to it to the main text.

The paper discusses the multi arm bandit problem to identify the best action in sequential decision making. It focuses on models in which there are non-trivial dependencies between the reward distribution of the arms. The approach formulates the problem with causal graphs and do-calculus. The key contribution is the proposal of an algorithm to find a set of variables called POMIS (possibly-optimal minimal intervention set). POMIS consists of a subset of all variables which can be intervened on to obtain the optimal strategy and the algorithm exploits graphical criteria. The paper provides useful results as the strategy of removing redundant arms is shown in the experiments to improve the cumulative regret relative to others that ignore the structure of relationships. It achieves this by essentially reducing the space to search for optimal actions and avoids pulling unnecessary arms. As described in section 5, there are potential cases where alternative approaches would assess a smaller number of arms.

The authors propose an algorithm that could improve the efficiency of multi-bandits algorithms by selecting a minimal, sound and complete set of possibly optimal arms. In their frameworks arms correspond to interventions on a causal graph, which could possibly have latent confounders. If the structure of the causal graph is known (but not the parameters), the proposed algorithms can exploit this structure to decide which interventions are provably nonoptimal and filter them out. I think the paper presents some interesting and novel ideas. The clarity could be improved, especially for a reader who is not familiar with the do-calculus literature (e.g. section 2). The experiments are limited, but in my opinion the theory makes the paper interesting in itself. Minor details: Abstract: I’m a bit confused about how could one empirically demonstrate that an algorithm leads to optimal (...) convergence rates. Some inconsistency in the notation, e.g.: line 88: PA_i is never defined, although it is clearly pa(X_i), Lines 117-123: maybe it’s pedantic, but shouldn’t there be some universal quantification on x,y,z, w? For example P(y|do(x), do(z),w) = P(y|do(x),w) for all x \in D(X), z \in D(Z) etc? Line 129: isn’t X = X intersected with an(Y)_{G after do(X)} just X \subset an(Y)_{G after do(X)} ? line 198: pa(T)_G\T … doesn’t pa(.) already exclude itself (in this case T)? Typos: footnote 4: “that are not latent confounders are not explicitly represented”, line 252 “four strategies”, line 214, 216 “intervening ON” Appendix B: Instead of Algorithm 4, wouldn’t it be enough to have a much simpler “while (T != oldT) { oldT=T; T=desc(cc(T)); }”?