NeurIPS 2020

Information Theoretic Regret Bounds for Online Nonlinear Control


Meta Review

Paper concerns sequential control of a nonlinear dynamical system with the underlying dynamics being a function in RKHS. The introduced algorithm LC3 enjoys an O(sqrt{T}) regret bound against the optimal controller with no explicit dependence on the dimension of the system dynamics. The paper received a mostly positive evaluation from the reviewers with one vote below the acceptance threshold (scores of 7, 7, and 5). The main strengths of the paper were identified as: - Novel results (on of the first in adaptive non-linear control) which should be of interest to the NeurIPS community. - The paper is very well-written, the technical quality is sound, and the code was included in the supplement. - Extensive empirical evaluation (although one of the reviewers found the evaluation to be inappropriate). Several weaknesses were also pointed out: - One of the reviewers found the contribution of the theoretical results to be marginal comparing to the past work. - Episodic setting contrary to standard single trajectory results in recent online control literature. - Intractability of the optimization step. - Little evidence of control-theoretic ideas in the paper.