__ Summary and Contributions__: This paper leverages a novel characterization of the space of stable matrices into a novel algorithms for estimating the parameters of a linear dynamical system from data. The algorithm provides marked improvements to the estimation error and the memory complexity compared to standard approaches.

__ Strengths__: This is a very clear paper, with a simple premise, and a derivation which explores both the theoretical and practical consequences of the new algorithm. The math is clearly stated and detailed. The multiple evaluation domains provide a strong degree of confidence in the value of the approach. The improvements appear material and relevant.

__ Weaknesses__: Topically, this work is not directly about machine learning since it describes a pure optimization method. However, estimation of LDS is relevant to the overall problem of deriving control strategies for data, so this isn't necessarily an issue, though it may be more valued if published at a venue more centrally devoted to control and optimization.
No mention is made as to whether the attached code will be open-sourced, which would greatly enhance the value of the paper.

__ Correctness__: Yes

__ Clarity__: Very clear and well organized.

__ Relation to Prior Work__: Work builds on a recent result that was independently derived, but provides a novel way to leverage the work and valuable empirical characterizations. Reviewer is not familiar enough with the literature to comment on related prior art.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper proposed a new method to learn stable linear models for dynamic systems. It utilized a recent characterization of stable matrices and constructed the gradient descent method to improve the reconstruction error within the stable linear model subspace. The proposed method has O(n^2) space complexity and achieved reconstruction errors that were orders of magnitude better than the baselines.

__ Strengths__: The proposed algorithm seems to be a significant improvement over the state-of-the-art: The space complexity is O(n^2) instead of O(n^4). The reconstruction error is orders of magnitude better than the baseline.

__ Weaknesses__: Although the linear dynamic system is widely used because it is convenient in math, the majority of the dynamic systems in our world is nonlinear. This paper argues that the method can be used with the Koopman operator for nonlinear systems. However, no example is shown. Experiments of combining Koopman operator or Differential Dynamic Programming with the learned stable linear model would make the paper much stronger.

__ Correctness__: The high level derivation seems good. I did not carefully check the detailed derivation in the supplemental material, though.

__ Clarity__: In general, the paper is well written.
It would be great to include accompanying videos to visualize the results. For example, comparing the reconstructed videos and the ground truth would be much more convincing than only showing a few frames in Figure 3. Similarly, videos of Franka robot tracking the desired patten would be helpful.
The paper has a detailed derivation of gradients of the optimization (6). It is not immediately clear to me how to make sure that the matrices satisfy the constraints: S is positive definite, O is orthogonal, C is positive semi-definite and contraction. Some explanations would be helpful.
I do not quite understand the best error frequency plot (Top row of Figure 2): The sum of the best error frequencies of the three methods does not add up to 100. For example, in the UCLA dataset, the best error frequency of SOC is 100%. Shouldn't the best error frequency of CG and WLD be 0% then?
What is the control U in the Franka example? Is it desired joint angle, desired joint velocity, or motor torque?

__ Relation to Prior Work__: The relation to prior work is clearly and sufficiently discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: I have read the author's response. It addressed my questions and I will keep my positive score.

__ Summary and Contributions__: The paper proposes a new algorithm for learning stable linear dynamical systems. The algorithm is based on a recent characterization of stable matrices which shows that a matrix is stable if and only if it can be written as a product of certain psd and orthogonal matrices. The proposed algorithm uses O(n^2) space in contrast to O(n^4) space used by previous methods, where n is the state dimension. The authors show that the new algorithm gets lower reconstruction error compared to prior art on benchmark datasets.
The paper also extends their approach to the control setting, where they jointly search over the state and control matrices. By doing this, they achieve superior performance on control tasks compared to an algorithm which uses prior work to estimate a stable state matrix and then does Least Squares to estimate the control matrix.

__ Strengths__: The experimental evaluation is very thorough and quite convincing. Since this is mainly an experimental paper, I think this is probably the most important criterion for evaluating it.
Within the experimental evaluation, the algorithm seems to do well on multiple fronts: 1) it uses much less memory, 2) it is more scalable as the dimensionality increases, 3) it gets lower reconstruction error, and 4) it does better on control tasks. The authors also evaluate the control task on a real robotic manipulator which shows the synthetic setup is an accurate representation of the real world task.
The provided code in the supplementary is also very well-organized, kudos to the authors for doing a great job on this.

__ Weaknesses__: One could argue that the algorithm is not very novel, given the previous characterization of stable matrices. But I don't think this is too much of a concern as the characterization has not been exploited for this purpose before, and the major contribution here is probably the empirical evaluation.
I found that the paper could emphasize the importance of stability more. For instance, it is nice to note that the LS solution which does not enforce stability does get worse over time in Fig. 3. Since the utility of the algorithm rests on the premise that it is important for the linear system to be stable, I think this premise should be supported a bit more.

__ Correctness__: The empirical methodology is very solid. I think it would be good to verify somewhere that the algorithm only needs O(n^2) space. On a related note, why is it that the prior algorithms needed O(n^4) space, if the matrices are only nxn?

__ Clarity__: The paper is well written and is easy to read.

__ Relation to Prior Work__: The authors mostly do a good job of discussing prior work. One thing that would be good to clarify is whether there was prior work in using stable matrices for the control task, since CG and WLS were formulated for the prediction task?

__ Reproducibility__: Yes

__ Additional Feedback__: 1. It would be interesting to examine the source of the better prediction accuracy of SCO. Are the solutions obtained by SCO more stable than CG and WLS? If not, what is the reason for the superior error performance? More generally, it could be good to more closely examine and discuss the results in Fig. 2, as some of the observed relations seem to demand further attention. For instance in the UCSD dataset, it seems that increasing the state dimension even hurts average error. Is this because the intrinsic dimension of the system is small, and estimating models with higher dimension would require more data? Also, for the UCLA dataset the average error of SCO reduces with dimension, whereas it increases for the other algorithms.
2. It would have been nice to evaluate the algorithm on a setup where the state dimension
needs to be high, in for example robotics settings. This is because one of the key proposed advantages of SCO is that it is scalable.
3. I am curious how the prediction accuracy of the algorithms compare on the coffee cup sequence.
A few additional remarks regarding presentation:
1. In Section 2.1, the authors shifts to systems without inputs somewhat abruptly, since in the beginning of Section 2 they considered the general case with inputs.
2. The authors should write CG and WLS in parenthesis the first time they use these abbreviations.
----Post author response----
I thank the authors for the detailed and helpful response. All of the points mentioned in the response are valid and help in understanding the work, and I encourage the authors to include it in the revision. I feel more confident about my score now, and have updated the confidence to reflect this.

__ Summary and Contributions__: This paper proposes a method to learn stable dynamical systems by an iterative optimisation method. The method preserves stability at every iteration update scheme. The proposed approach is evaluated in simulations to a variety of problems, including learning dynamic textures from image sequences and controlling a robotic manipulator. Compared to existing approaches, the proposed method achieves an orders-of-magnitude improvement in reconstruction error and superior results in terms of control performance

__ Strengths__: Claims are sound and empirical evaluation well-conducted. The paper is well written, notation and claims are clear. Extensive experimentations and results help to understand the contribution of the paper. Authors clearly state their contributions and the gains they observed from their experimental results, both in term of performance scores and memory-efficiency. The proposed method compares favourably with respect to existing algorithms on certain datasets in terms of average error.

__ Weaknesses__: The paper is at times ill-motivated. It is unclear why the dynamical systems estimated from the datasets considered should be stable. This is particularly true for the control experiments, with manipulation tasks leading inherently to unstable dynamics.
The novelty with respect to existing system identification methods (such as stability preserving subspace identification methods and methods based on Riemannian optimisation) is unclear.

__ Correctness__: Claims and methods are correct to the best of my understanding. The empirical methodology is correct. However, it is unclear why prediction methods are not considered.

__ Clarity__: The paper is well written and well structured. Results are presented in a clear way and help to understand and highlight the contribution of the paper.

__ Relation to Prior Work__: Literature review can be improved. Additional references concerning stability preserving subspace identification methods would strengthen the paper considerably.

__ Reproducibility__: Yes

__ Additional Feedback__: The code submitted is clear and well documented, favouring reproducibility and further research in this field.

__ Summary and Contributions__: This paper presents a novel algorithm to learn stable linear dynamical system from data. Based on the characterization of matrix stability method proposed by [22], they derive a gradient-descent algorithm to optimize the learned model to satisfy constraints by minimizing the constructed error. Extensive experiments including on common benchmarks and robot arm control, demonstrate that their proposed SOC outperforms existing important baselines.

__ Strengths__: 1. I quite like the presentation of this paper, which is easy to understand. Their key idea, i.e. introduction of the characterization of matrix stability into learning model is demonstrated and evaluated well.
2. Learning stable LDSs is useful and relatively underexplored compared with other learning based control areas. Their proposed method is simple to implement and effective in ensuring the stability in the learning based LDSs.
3. Their SOC outperforms the competing baselines, CG and WLS by a large margin. The robot arm experiment they conducted is interesting, and can reflect effectiveness of the learned control model.
4. SOC also has the advantage of efficient memory, which is important in many robot applications.
5. Their code is provided.
Overall, I think this is a good work that is useful in learning based control.

__ Weaknesses__: 1. Can you briefly and intuitively introduce the reason why the characterization of stable matrices can ensure the stability of LDSs in your main paper? Although this is not your main contribution, it would be good for readers to better understand your idea, and more convincing, given a short introduction of this technique.
2. Can you talk about the assumption of the characterization of stable matrices and your derived gradient descent algorithm for it? It will help to understand the applicability of your method.

__ Correctness__: Their claims and method are technically sounding.

__ Clarity__: This paper is well-structured and easy to understand.

__ Relation to Prior Work__: They have discussed the previous work in the introduction.

__ Reproducibility__: Yes

__ Additional Feedback__: