Paper ID: | 2820 |
---|---|

Title: | Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium |

The paper presented a new approach for working with strategies in sequential decision making problems. The new algorithm using the approach advances the state of the art for that class of problems, although the practical applications might be limited by the requirement of no chance events. The writing was fairly clear, although this paper does seem a bit cramped: there were a number of things I wish could have been expanded. That said, I don't think there is room without cutting or condensing something else, and the results stand well enough on their own that the paper should still be published as is. (Consider an expanded journal version?)

This is a well written paper with a original and significant contribution. Paper is well referenced and written very clearly. I very much appreciate that authors include the small running examples before the full construction. This makes it much easier to comprehend the ideas - and authors generally do a really good job trying to get the idea through rather than just jumping to formal constructions. The only weaker part of the paper is the evaluation section, which feels a little rushed and cramped.

**Originality** The paper presents a novel perspective on counterfactual regret minimization (CFR) in sequential problems which leads to new proof if the CFR theorem for sequence form plans and allows generalization of the result to correlated plans. As a result, it presents first CFR-based algorithm for computing EFCE. I consider the contribution in the paper to be novel. **Significance** The scaled expansion operator can potentially lead to further generalizations of CFR-based solution techniques for other problems. A scalable algorithm form computing EFCE may lead to new applications of computational game theory in the real world. Therefore, I consider the contribution to be significant. **Quality** The part of the paper that introduces scaled expansion, its properties and use to create correlation plans seems to be correct. However, the paper lacks a lot of details on how exactly do these contributions translate to a complete algorithm. What loss is optimized in each local regret minimizer? How is average strategy computed? Why does the algorithm actually converge to EFCE? I assume the answers to some of these questions are in Farina et al. (2019c) and many can be guessed with sufficient background knowledge, but the paper seems quite incomplete without this discussion. **Clarity** The paper deals with complicated topic, but it still reasonable easy to read. I think it is doing a very good job within the limited space. However, it should definitely include more (at least high level) discussion of how to turn the results presented in the paper into a complete algorithm and why the complete algorithm actually converges to EFCE. The description of the experimental domain should include more details about reward structure to see that it is sufficiently non-zero-sum. ** After Rebuttal ** Thank you for your response. Even with the response, I still do not know how to implement the algorithm so it is not likely that you can make the paper more self-contained in the given space. With the averaging, rather than explaining what is linear averaging, it would be more useful (in the final paper) to comment on why it is OK to simply average the correlation plans, since it would not work with behavior strategies, which are more common in the context of CFR.