NeurIPS 2020

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring


Meta Review

This paper generated a lot of discussion amongst the reviewers. There is largely agreement that this work contains many interesting results, and I agree. I also agree with the reviewers that there is some discontinuity between the experiments and the theory and encourage the authors to use their extra page to carefully explain the choices and differences between the set-ups. Ultimately I do think it is nice to see experimental evidence in the discrete setting, which possibly hints towards theoretical results in that setting as well. One question I had while reading the paper: Do you think there is hope to prove a sqrt(T) minimax bound for TS for strongly observable games? The logarithmic regret bound that you prove would normally lead to a T^{2/3} bound, since it depends on 1/Delta^2. Some discussion on this might be interesting. I also found a few minor typos. L21: "unaware" -> "unobserved" L24: "the whole rounds" -> "all rounds"? L99: The definition of Pareto optimal may be not quite correct when there are duplicate actions (actions with the same loss). L102: In the neighbour definition the union should be an intersection. L231: The minimax version of this bound will be O(T^{2/3}) while O(T^{1/2}) is known to be possible in this setting. Do you have any sense about whether or not this is appearing only in the analysis or also the algorithm? * How does this algorithm compare to IDS by Kirschner and Lattimore in the linear setting? L287: Do you believe a theoretical guarantee is possible? I would guess not. E.g., what happens in a game where there is a costly, but informative action and all other actions give no information. Will not Thompson sampling quickly learn not to play the informative action and then suffer linear regret? L286: Do you explicitly say that dp-easy is (strongly) locally observable and dp-hard is only globally observable? L209: Since the unknown parameter is now in R^M, how do the definitions of local observability and the cell decomposition need to change?