__ Summary and Contributions__: The authors propose a neural network architecture for learning stochastic dynamic systems, with the constraint that the learned dynamics are stable. This is achieved through a specialized final layer representing a Lyapunov function. The desired constraints on the Lyapunov function, sufficient to give stability, are enforced on this final layer, conferring stability to the learned dynamics. The approach is demonstrated on several simulated systems.

__ Strengths__: Stability in learned dynamics is very important for applications in which forecasting plays some important role. In control applications, instability can lead to extreme control signals and catastrophic failure for example. For this reason the authors' work is significant and relevant to a number of communities.

__ Weaknesses__: I see no major weaknesses. The authors' work is perhaps a reasonable step forward.

__ Correctness__: The methodology appears correct.

__ Clarity__: Overall the paper is reasonably clear. However it would benefit from a concise and complete description of the network architecture, perhaps in the beginning of Section 6 (since all relevant concepts have been introduced by this point).

__ Relation to Prior Work__: The relationship with prior work is clearly described. The authors' describe differences with the related literature, but could do a slightly better job of describing the consequences of those differences. For example the work of Umlauft and Hirche is described (line 206) as pertaining to "certain state transitions." Is this a limiting factor in their work or not? How does these "certain state transitions" relate to the authors' work?

__ Reproducibility__: Yes

__ Additional Feedback__: I find the authors' rebuttal to be satisfactory. I am content to leave my score unchanged, but increase my confidence.

__ Summary and Contributions__: The authors propose a method to impose stability on learned stochastic discrete-time systems. The proposed method is in line with the idea of learning a neural Lyapunov function simultaneously, and they apply it to the discrete-time setting and also extend to the use of non-convex Lyapunov (candidate) functions. They also suggest using mixture density networks for modeling stochastic systems and show that the proposed framework can impose moment stability.

__ Strengths__: The motivation and the method are technically sound and match well. As learning discrete-time stochastic systems is very frequently encountered in ML practices, this paper will be of great interest in the NeurIPS community.

__ Weaknesses__: The experimental evaluation might be criticised as being limited because it is now only for low-dimensional systems. (Meanwhile, I do not think it is a fatal flaw of the paper. Learning 2d or 3d dynamics with some desired properties is indeed a nontrivial problem.)

__ Correctness__: To my knowledge, the technical claims seem valid and correct.

__ Clarity__: Basically the paper is easy to follow.
A point that might be unclear is the various definitions of stochastic stability presented. While all the well-known notions are covered in the Background section, what's discussed afterward is only the 2nd mean stability. The authors may want to elaborate on why only the mean stability is treated there and comment on possibility of imposing other types of stability, if possible.

__ Relation to Prior Work__: Relation to prior work is clearly stated.

__ Reproducibility__: Yes

__ Additional Feedback__: Great work! I enjoyed reading the paper.
In Examples 1 and 3, readers may be interested in variance of not only learned models (i.e., prediction paths) but also true dynamics.
-----
[After rebuttal]
Thank you for the rebutal. It was satisfactory. I maintain my evaluation (8), which is originally very positive.

__ Summary and Contributions__:
In the manuscript, the authors propose methods for learning discrete-time stochastic dynamic models from observed data using DNN.
The methods learn the dynamics models using surely stable constraints.
Specifically, the proposed methods learn the dynamics to satisfy the Lyapunov stability condition using the Lyapunov function represented by a neural network.
The proposed methods have two approaches for each, one exploits convexity of the Lyapunov function, while the other enforces stability through an implicit output layer.
Because discrete-time stochastic dynamic models might be important in real-world industrial applications such as control, self-driving vehicles, or anesthesia feedback, this manuscript may deserve to be published in neurips2020.
On the other hand, there are some concerns regarding comprehensibility and effectiveness, which are described below, which led me to this judgment.

__ Strengths__: Because discrete-time stochastic dynamic models might be important in real-world industrial applications such as control, self-driving vehicles, or anesthesia feedback, the high estimation accuracy of the proposed method by adding stability constraints to the estimation of noisy discrete-time systems is greatly useful.

__ Weaknesses__:
The following four points are considered to be weaknesses.
1.
The purpose of the study and the claim of the study is difficult to understand even after reading the abstraction and introduction.
For example, the word "deep dynamic models" is not well understood.
Also, there is little background explanation of why the authors focus on the discrete-time dynamics model.
2.
There is a lack of discussion about the results of the experiments.
Please state exactly what you are trying to argue from the results of each experiment.
3.
There is a lack of comparison with the methods of previous studies [30].
Isn't it possible to model a stochastic discrete system as a continuous deterministic system and get the same level of performance?
4.
The details of the training model and settings, such as the number of samples used for training, are not described.
Also, no program is attached.
Therefore, reproducibility is not guaranteed.

__ Correctness__: Although γ* in equation (11) is estimated as a constant, the corresponding part of equation (4) varies with x_i.
Compared to Eq. (4), the ability to express f(x_i) in Eq. (11) appears to be diminished. Doesn't this poor expressive ability limit the range of application of the method?

__ Clarity__: As I mentioned above, the introduction is written in a way that makes it difficult to understand the purpose of the study and the claim of the study.
It is also desirable to have a conceptual diagram of the proposed framework.
The quotation of expression numbers, but only the expression numbers are given, making it difficult to understand.
How about Eq. (X)?

__ Relation to Prior Work__: As I mentioned above, there is a lack of comparison with the methods of previous studies [30].
It would be needed that the results of previous studies' methods should be described, as well as necessary comparisons of estimation accuracy and computational cost of the methods.

__ Reproducibility__: No

__ Additional Feedback__: As mentioned above, the authors would be better off performing verification to compare the accuracy of the proposed method with the method of previous studies that estimates the Hamiltonian from the wave function without using a DNN.
==== UPDATE AFTER AUTHOR RESPONSE =====
Thank you for your very careful reply.
I decide to raise my rate of the manuscript because, including the matter of comparison with previous studies [30] which was my biggest concern, has been resolved.