__ Summary and Contributions__: Considers Bayesian persuasion in an online setting where receiver types are unknown. First shows the result (surprising to me) that, even given a known distribution over receiver types, computing an approximately optimal signaling scheme is NP-hard. This translates to a hardness for online algorithms.
Then gives (inefficient) algorithms for the online problem. First shows that it suffices to try to induce a finite set of posteriors W* (exponential-sized in the number of states of nature). Thus, a no-regret algorithm like polynomial weights works.
Finally, considers the partial information setting, which is much more complex. The approach proposed is to mix some explore and exploit within an epoch, then give feedback from the epoch to the full-information algorithm, and repeat. The analysis is quite involved and only summarized in the main text.

__ Strengths__: I think the paper studies a very interesting, novel problem and gives interesting results. It is clear and well-written. It is well-referenced and placed within the broader literature. The results are intriguing.

__ Weaknesses__: This is hard to avoid, but many technical details were left to the appendix.

__ Correctness__: Unfortunately I did not check the proofs, however, the high-level overview makes sense and I have reasonable confidence in the results based on the main text.

__ Clarity__: Yes.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: Nice job, I enjoyed the paper.
For both hardness results and upper bounds, think it would be useful to emphasize which parameter the algorithm is exponential in (I understood it to be just "d"). This is very interesting. I was also very interested in the NP-hardness result and would have enjoyed some more intuition for it in the main text.
--------
After author response: Thanks for the response. During discussion, I agreed with other reviewers about some downsides, but felt they were not severe and still felt very positive about the paper.

__ Summary and Contributions__: This paper builds on the extremely impactful model of signal sending called Bayesian Persuasion, from economics, by formulating a learning problem whose base is BP. There is a sender and receiver of signals, and the sender wants to design a signaling scheme to optimize his payoff based on the action the receiver takes. This paper puts this into a repeated framework and asks how a sender can learn a signaling scheme over the repeated rounds that does well relative to the best fixed signaling scheme. Here, this is not just the offline optimal, because the receiver each round has a type that affects her payoffs, and the sequence of types may be adversarial (but not adaptive).
The authors give several results. First they show that computing the offline optimal is computationally hard, even to approximate. Then, relaxing the requirement of polynomial time, they give an algorithm for the full feedback setting, and show that standard no-regret algorithms can be used here. Then they use the full information algorithm as a subroutine to get a no-regret algorithm for the partial feedback case.
--Update --
Thanks to the authors for your response. Re notation, I see the issue about many more symbols; one approach that I have found useful if you want to use the same letter is to use a little text (e.g. W_{OPT}, W_{Greedy}, etc) that carries the information in the symbol itself. In any case, my score is unchanged.

__ Strengths__: The model is a very well known and important one, and the paper is technically very interesting. These results appear to be novel to me and correct, and the paradigm is very powerful so it’s conceivable that the algorithms described could be applied in practice.

__ Weaknesses__: I think between the notation and maybe drawing things out more than necessary, the clarity of the paper suffers. For instance, showing that any optimal scheme could be written as a combination of a finite number of signals is pretty intuitive (since all that ultimately matters across signaling schemes is the distribution of actions taken over the finite action set), so I would move this to the appendix and make a more informal discussion that is easier for threader to follow. Notationally, the symbols are too similar for me to keep easy track of - i.e. there’s W, W circle, W*, W hat, W with concentric circles… Maybe if you used different letters, it would be hard for the reason that you would convey less information with the notation, so I won’t say for certain that there is a better option, but the current state imposes a lot of attention cost on the reader, in my opinion.

__ Correctness__: As far as I can tell, the main results appear correct, though I am not a complexity person so I was not able to fully verify that result.

__ Clarity__: See weaknesses above - there are notational problems and more density than appears necessary. With enough time and attention, it is fairly well-written, but I think the writing could be improved.

__ Relation to Prior Work__: Mostly, but - while I am no expert in this literature, there do appear to be some BP models focused specifically on algorithmic aspects e.g. Dughmi and Xu, and Babichenko and Barman. These particular examples appear to have different setups, and so it’s not surprising that they seem have different answers to the question of complexity of computing approximately optimal signaling schemes, but I would still prefer to have a clear comparison to these models. I don’t know the literature too well so I’m not claiming these are the best or only examples, but just that it appears there is a computational BP literature out there not discussed in the related work.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper studies online Bayesian persuasion, in which the sender needs to persuade a sequence of receivers with adversarially chosen types from a finite set of possible types. The paper first shows that even the offline problem is NP-hard. Without time constraints, the paper then presents algorithms with a good regret bound in the full information setting. They also extend this result to the partial information feedback setting via a reduction.
--After feedback--
Thanks for the feedback.
Here is a minor comment. For the response "thus showing that the number of states is the only source of hardness", I don't fully agree with this sentence as the lower bound is worst-case. There could be other natural constraints that make the problem tractable. But I can see your point.

__ Strengths__: The paper initiates the framework of online Bayesian persuasion. The algorithm is mainly based on the characterization shown in Lemma 1: the sender only needs to consider a finite set of signal schemes.

__ Weaknesses__: The negative result of the paper shows that the problem is NP-hard. The authors take it as a permission to use algorithms with bad running time. I think it’s more interesting to take the other direction: getting poly-time algorithms in some more restricted settings.

__ Correctness__: The results in the paper look correct to me.

__ Clarity__: The paper’s presentation is clear.

__ Relation to Prior Work__: The paper clearly discusses the relation to prior work. Here is a paper that might be related: “Incentivizing Exploration with Heterogeneous Agents”. It studies how to persuade agents of different types to do exploration.

__ Reproducibility__: Yes

__ Additional Feedback__: Line 41: Technically, your result only says the problem is NP-hard. It might be more accurate to say in that way. Otherwise it sounds like you have proved P!=NP.