Summary and Contributions: The authors synthesize a variety of results to construct an end-to-end regret analysis for model predictive control with non-convex costs. They use this analysis to craft a robust objective (and, subsequently, a tractable-to-actually-optimize surrogate objective) and, use demonstrate this algorithm on a nontrivial traffic-planning tasks.
Strengths: Soundness (theoretical grounding, empirical evaluation): The authors do a tremendous job theoretically grounding their algorithm. Additionally, the empirical evaluations clearly demonstrate the superiority of the author’s method (and how the author’s method actually recapitulates the theoretical guarantees of the proposed algorithm). Significance and novelty of the contribution: This work is truly a tour de force in combining several different methods in one package to facilitate robust control on real tasks that people actually care about. It appears to be the first end-to-end result guaranteeing that the best result achieved during planning is achievable by underlying system, which seems quite significant (though I am admittedly not an expert in robust control). Relevance to the NeurIPS community: Devising efficient robust control algorithms is crucial for reinforcement learning to see more widespread use in industry. This work is extremely relevant to the NeurIPS community.
Weaknesses: Soundness (theoretical grounding, empirical evaluation): I think the provided evidence is already quite overwhelming, but I’m always eager to see methods applied to more varied environments. Significance and novelty of the contribution: No obvious weaknesses in significance or novelty. Relevance to the NeurIPS community: Clearly relevant.
Correctness: To my knowledge, yes.
Clarity: The paper is exceptionally clear (barring some extremely minor spellings mistakes).
Relation to Prior Work: The work is extremely well situated with respect to prior work.
Additional Feedback: Post author rebuttal edit: I look forward to this analysis being applied to even higher dimensional systems :) I'm keeping my score as is, and am satisfied with the author's response.
Summary and Contributions: Update: Thanks for the thoughtful response. Your proposed modifications largely address my concerns. Specifically: - I see now that Thm 2 handles both the discretization of actions and the finite planning horizon, separately from Thm 3. I'm satisfied by your proposed reformulation of Thm 3 to improve clarity. - It is true that the PE assumption is common in classic controls literature, even if it isn't in the more recent online learning literature. Given the scope of the planning problem you consider, I'm satisfied by the proposed addition of a corollary, which will more clearly highlight where the assumption comes into play. - The bounded perturbation assumption is common enough in robust planning that I think its a satisfactory way around the noise process issue. The scope of the proposed changes is, in my view, borderline in terms of whether it necessitates another round of review to assess. But overall I land on the positive side, and have updated my score to reflect this. ------- This paper proposes an MPC algorithm for robust and adaptive control of linear systems with structured uncertainties and nonconvex rewards by combining: asymptotic confidence bounds for linear regression, interval prediction bounds, and tree-based planning over pessimistic rewards. The suboptimality of the proposed approach is bounded, and additionally a data-driven model rejection method is proposed.
Strengths: This work considers a setting relevant to modern control problems which cannot be modelled by simple convex or quadratic cost. It is potentially significant to bridge ideas from modern control and planning with those of online learning and nonasymptotic statistics.
Weaknesses: This paper is ambitious, bringing together many different ideas. A downside is that each individual idea receives only passing attention, and the complexity of the proposed method obscures some of the core concepts. See the "Clarity" section for further comments. The significance of the main result is hampered by its reliance on an assumption that the feature vectors are persistently exciting. Usually, the main contribution of non-asymptotic analyses of linear regression is to show a condition of this form holds a priori based on system inputs. I also have reservations about the soundness of the "regret" claims. First, it is unclear how the suboptimality result in Theorem 3 can be crisply defined as regret. The connections between the finite planning strategies over stage costs and the infinite horizon performance V is also not clear. Some of the reasoning about probabilistic bounds is over infinite horizons and continuous signals is fishy. See the "Correctness" section for further comments.
Correctness: The remark on line 135 suggests that omega can be bounded with high probability by union bounding over n(n+1) events. It is not obvious to me why such a finite set is sufficient. The main result in Theorem 3 focuses on the infinite horizon outcomes encoded by V, but the proposed algorithm plans on finite horizons. The precise connection between the two is not made obvious. Should there be an additional term in the sub-optimality bound to account for the finite planning horizon?
Clarity: There are several ways that clarity could be improved: - The connections between the discrete measurement noise and the continuous time noise process is not clear - The switch to action notation in Section 4 is abrupt, even though it is introduced earlier. It would be good to explicitly remind the reader. - V_u in theorem 3 is not defined in the body of the paper. - Many different topics are addressed in this work; it may be better to cut certain topics to devote more explanation to others. For instance, - the state prediction method in Prop 2 is not used, so there is no need to devote space to it - the multi-model selection setting is disconnected from the rest of the paper, and maybe isn't necessary to include (or it should be included as a motivation from the beginning if the framework can be view in a unified way)
Relation to Prior Work: Yes, related work is adequately explored.
Summary and Contributions: The paper proposes a method for robust adaptive model predictive control with general, non-quadratic costs. It can provide performance guarantees, and a regret bound. The method is verified empirically on two test systems in simulation.
Strengths: I think this is a very solid paper that explores a question of high practical significance: how to learn a system model to be used for model predictive control, without allowing catastrophic performance in the process, that is, with robustness guarantees. Tools from multiple areas are brought together, and competently integrated to provide an entire solution for the closely relates problems of estimation, prediction, and control. The method might appeal to practitioners in industrial settings, where safety is important, and general exploration strategies common in RL might not be suitable. The paper is probably closer to work in the control systems community, but NeurIPS researchers working on RL might appreciate the different outlook on safety during exploration.
Weaknesses: The dynamics are still assumed to be linear, with the uncertainty parameterized as a linear combination of known feature vectors. This is quite restrictive (but also applicable to many practical problems). The discretization of controls might result in some loss of performance, too.
Correctness: I believe the empirical methodology is correct. I could not verify the correctness of the mathematical details in complete detail.
Clarity: The paper is very well written.
Relation to Prior Work: There is a clear discussion of prior related work, and the contributions of this paper with respect to it.
Additional Feedback: Minor typos: P.2, L.71: "a tracking a subgoal" -> "tracking a subgoal" P.3, L.83: "forth" -> "fourth" P.7, L:244: "an medium" -> "a medium" --------------------------------------------------------------------------------------- I have read and taken into account the rebuttal.My score has not changed.
Summary and Contributions: Post-rebuttal: I would like to thank the authors for their response. As stated in the original review, I think comparing to DQN will improve the paper. I'm still in favor of acceptance. =================================================== This paper address the problem of robust control of continuous dynamic systems, where the system’s dynamics is unknown but assumed to have a linear structure, with external polytopic disturbance. The proposed approach consists of several steps for each action, first model and confidence region estimation (or refinement), then worst case reward extraction and state estimation bounds, a conservative planning step based on the reward and state bounds, finally one step execution, and repeating the process in an MPC like manner. The paper presents an end to end approach to the robust control problem for unknown dynamics (only the system dynamic matrix is unknown) in an adaptive manner. The paper strives to address problems beyond stability or tracking where quadratic costs are common and well studied, such as obstacle avoidance. The control is obtained with a tree search algorithm, and for some more restrictive cases (which are common in the control literature) the regret of the algorithm is provided. The contributions of the paper are mainly for each of the algorithm steps, and putting everything together in an end to end approach with restrictive regret analysis: – An extension to previous work in confidence regions to an elaborate representation of feature matrices. – An MPC like algorithm solving the worst case value function under polytopic disturbances, for the purpose of state estimation. – Proposing a surrogate value function to encapsulate the worst case rewards on an interval, for the robust control computation. – Extending the proposed approach the multi-model case.
Strengths: The paper is well written, and the approach is described for end to end with regret and regions bounds. Finally the effectiveness is demonstrated on two evaluation examples.
Weaknesses: Evaluation against other methods was not presented. This would improve the paper greatly.
Correctness: Everything seems to be correct.
Clarity: The paper is written clearly and each step is explained.
Relation to Prior Work: The relation to previous work is clearly discussed.
Additional Feedback: The Adaptive control community have dealt with similar problems (usually in the context of stability), and came up with solutions, sometimes with resemblance to the ones in this paper. It would be appropriate to give a quick survey of the classical system identification and robust adaptive MPC work that have been done, for example (there are many more) "Robust Adaptive MPC for Systems with Exogeneous Disturbances" by V. Adetola and M. Guay.