Reviews: Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Summary: This work proposed a practical implementation of an effective approach InteractiveRecGan to utilize offline data to build a recommender agents with better policy than the reference one. Specifically, a model-based RL framework is incorporated with adversarial training to better utilize the offline logged data to lean a “optimal” policy. Using 1 - 5 scale with 5 highest Originality: 3 out of 5 Compared with the recent related works such as [1], the main difference is to use a model-based RL instead of a model-free RL. Meanwhile, compared with model-based RL in recommendation such as [2], the main difference is the adversarial training Quality: 4 out of 5 This work is technically sound, and the proposed methods are well supported by both theoretical analysis and the empirical analysis Clarity: 4 out of 5 This paper is very well-written and the empirical results are presented in an easy-to-read way. All the plots are very easy to read Significance: 3 to 4 out of 5 It is not clear and how promising the proposed approach is to be deployed and served as an online recommender [1]: Generative Adversarial User Model for Reinforcement Learning Based Recommendation System, in ICML'19 [2]: Model-Based Reinforcement Learning for Whole-Chain Recommendations

Reviewer 2

In this paper, the authors propose InteractiveRecGan to utilize offline data and build a recommender agents with model-based RL algorithms. They integrate a user behavior model, an agent model and a discriminator together. The paper is well written and easy to follow. The proposed model is verified by theoretical analysis and experiments with state-of-art baselines. I have read authors' response, but my mind did not change.

Reviewer 3

Originality: The proposed approach is a novel combination of well-known techniques such as RL and GAN for recommendation. Related work has been adequately cited. It is clear how the proposed approach differs from the existing literature. Quality: The approach appears to be technically sound. The theoretical analysis and the experiments support the claims. This is a complete piece of work. The experiments and results sections are quite terse. I wonder if authors could have identified any more experiments relevant to their work and cut down some of the theory/derivation (or, move them to the supplementary section) to make space. line 271: We randomly selected 65,284 sessions for training and the left 3,437 for testing. Were the parameters tuned on the test set? Did you have a Validation set? Clarity: The paper is mostly well-written except for some typos: line 32: we **exploring framing** recommendation as building reinforcement learning (RL) agents line 37: Classic model-free RL applications requiring collecting large quantities of interaction data with self-play and simulation. line 39: In contrast, such methods suffer from very high sample complexity so that simulation for generating realistic interaction **experience nonviable**. line 169: And at each step, **the will** generate next click by line 174: Simultaneously the model **needs to decides** the probability line 248: The last term is the **objection** for generator in GAN. Significance: The overall problem of learning to recommend is an important problem. Any approach that can improve recommendation performance will be of interest to multiple entities - industry, academia. Statistical significance numbers are missing in the results. Otherwise, the results look good. I have carefully considered the authors' response. The rebuttal looks fair. However, my questions were more of a request for clarification. So, it doesn't change my score.

Paper ID:	5724
Title:	Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reviewer 1

Reviewer 2

Reviewer 3