Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
4) Originality: The work is highly original. It introduces a suite of real-world environments where RL algorithms can have an immediate impact. 5) Quality: This submission is of high quality. The environments are well-described and the experiments are exhaustive. The authors also highlight the limitations of their environment e.g. the size of the cache. 6) Clarity: The work is well-written. It makes a good job of describing the particular challenges of using RL algorithms for computer systems. 7) Significance: I think the work is significant and I hope it encourages other researchers to develop RL methods designed for computer systems. 8) Conclusion: I recommend acceptance given the importance of real-world environments to benchmark RL algorithms.
It is great to see the kind of interest in applying machine learning, and specifically reinforcement learning, into real-world problems such as computer systems as presented in this paper. While the paper has no significant contributions on either a theoretical or algorithmic front, it does an important job at highlighting some of the issues in applying modern RL algorithms to real problems, and provides a necessary benchmarking environment for computer systems research specifically. The problem domains included have a wide variety of characteristics, from high-frequent real-time systems to very-long horizon problems, uniquely structured state and action spaces and both simulated and real environments (some other related work that could be added is ). Especially the latter is valuable to ground any research. Moreover, the authors provide an RL baseline result for each of the proposed tasks, and highlight some of the problematic characteristics of these tasks for RL specifically. There could be a more elaborate discussion of the results however. Overall the paper is clearly written and structured, although there are some minor grammatical errors (mainly in the appendices).  Claeys, Maxim, et al. "Design and evaluation of a self-learning HTTP adaptive video streaming client." IEEE communications letters 18.4 (2014): 716-719. UPDATE I thank the authors for their rebuttal and agreeing to incorporate my suggestions. I am sticking with my score.
This paper (alongwith its supplementary material) provides good detail of the Park platform and the hyperparameter settings during the experiments which makes the work reproducible. The paper is well written and the flow is understandable too. The paper claims Park to be an extensible platform at more than one occasions, however, the claim has not been substantiated with any metric. What makes the platform extensible should be more elaborate in the paper. For example may be a case study can help elaborate how easy is it to add a new systems environment? How much effort is required? etc. The paper should discuss how different/good/bad is Park from similar or the closest platforms s.a. Facebook Horizon. What makes Park different from others and how? If Park has outperformed any baselines in other environments/problem specific learning methods, those comparisons can be made part of the paper to show superiority of Park There are a few horizontal lines (i.e. no improvement in policies is seen) in figure 4. These cases should be discussed in text why has this happened? any possible reasons for such situations Line 284: The paper should further elaborate which existing heuristics and optimal policies have been provided (just name them in the paper as an example for ease of the reader)