__ Summary and Contributions__: The authors propose a neural ODE model for second order dynamics. The paper provides thorough theoretical and empirical analysis of both the representational capabilities and training dynamics of the model. In addition the authors also discuss how their proposed model is related to augmented neural ODEs and provide novel insights into the consequences of using augmentation for neural ODEs.
Contributions:
- The authors provide a nice theoretical analysis of the adjoint method for the case of second order ODEs. They show that directly using a second order adjoint method is equivalent to using a first order adjoint method on an augmented system but that the latter has lower computational cost.
- The authors show that second order neural ODEs can overcome some of the representational limitations of neural ODEs in a similar way to augmented neural ODEs. However, the proposed method often finds “nicer” solutions than naive augmentation and provides a more interpretable understanding of what happens on the original (non augmented) space.
- The authors also provide a thorough and interesting analysis of the effects of augmentation both theoretically and experimentally. The authors show that augmented models can learn higher order dynamics but that the learned representations are often “entangled” in the sense that the augmented dimensions do not exactly represent the velocity of the underlying second order system.
- Through experiments on small scale physics datasets, the authors show that their proposed model performs better than regular augmented neural ODEs, particularly in the presence of noise and when the underlying dynamics are known to be exactly second order.
Update after rebuttal:
Thank you for the rebuttal. The rebuttal hasn't changed my score (mostly because I didn't think the paper had many issues to begin with). I think this is a strong paper and deserves to be accepted. I'm also excited to see the results of running 3rd order SONODEs on the airplane modeling task.

__ Strengths__: Strengths:
- The paper is extremely clear and thorough. The authors provide both theoretical and empirical justification for all their claims.
- The motivation of the paper is interesting and provides a good first step towards neural ODE models that are appropriate for physical systems (or other second order systems).
- The paper provides a nice new perspective on augmentation techniques in the case of neural ODEs.
Significance:
- I believe this paper provides important insights both for people interested in neural ODEs and for the machine learning + physics community.
Novelty:
- The methods and analysis provided in the paper are novel. Second order behaviour in neural ODEs has also been analysed in the dissecting neural ODEs paper, however it can be considered to be concurrent to this work. Further in this paper the focus is on physical systems whereas the dissecting neural ODEs paper focuses on classification.

__ Weaknesses__: - The paper mostly focuses on quite small experiments. This is okay as the paper is mostly concerned with analysing the higher order behaviour of neural ODEs and getting a detailed understanding of this. However, it would be nice to see larger scale experiments if possible. Are there any larger scale physics (or other second order) datasets that could be tested? What about modeling e.g. n-particle dynamics?
- The proposed model is limited to second order behaviour even though higher order behaviour is not uncommon in real life physical systems.

__ Correctness__: The claims and methods in the paper are, to the best of my understanding, correct. The empirical methodology is also correct and thorough. I appreciate that almost all experiments contain error bars and standard deviations across several runs. The authors also provide the code to reproduce all experiments which is great.

__ Clarity__: The paper is extremely well written. I like that the authors often start with a motivating example to give intuition and then move on to more general statements (e.g. section 5.1), which makes the paper clear and easy to follow. The figures are also great and the proofs in the appendix are clear and well explained.

__ Relation to Prior Work__: The authors provide a good and clear discussion about how their work is related to previous contributions. A similar model for second order dynamics is given in the dissecting neural ODEs paper, however this can be considered concurrent work and further, as the authors mention, the dissecting neural ODEs paper focuses mostly on empirical classification tasks whereas this paper provides a deeper analysis and understanding of this behaviour, particularly in the context of physical systems and augmentation.

__ Reproducibility__: Yes

__ Additional Feedback__: As you mentioned in section 6.2, some physical systems have behaviour that is order 3 or higher. Have you considered trying to model higher order behaviour with your model? Wouldn’t it be fairly simple to extend this model into a 3 * d dimensional phase space and restrict the dynamics function appropriately? Would this potentially improve performance on the airplane vibrations task?
The broader impact statement is particularly well thought through and I appreciate that the authors took the time to consider this in detail.
Typo in line 695 in appendix: grater -> greater

__ Summary and Contributions__: This paper proposes second order neural ODEs, a constrained version of augmented neural ODEs, that can sometimes facilitate learning of time series better than the unconstrained augmented neural ODE version. They derive new methods for training second order neural ODEs in the process.

__ Strengths__: The progression of ideas from neural ODEs to second-order neural ODEs is quite clear. The propositions and numerical experiments are as comprehensive as one would expect for a ten-page paper. The appendix is well-explained and comprehensive as well. I particularly appreciate that the authors showed an example of the less constrained augmented neural ODE approach beating their second order neural ODE approach (in Fig. 8) because it illustrates that the constraints that they use will not be appropriate for certain types of time series.

__ Weaknesses__: Honestly, the only weakness of this paper is that the constraints being placed on ANODE to get SONODE can be limiting. Despite new experiments, this remains my worry.

__ Correctness__: Yes, and yes.

__ Clarity__: Yes.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper focuses on the second order neural ODEs and shows that the adjoint sensitivity method can be extended to this framework, furthermore, an alternative first order optimization method is computationally more efficient.

__ Strengths__: The extension from first order ODE to second order has application potential. Simulations are clear and indicate what is expected.

__ Weaknesses__: I am not familiar with the background of this paper, NODEs. But at high level, it seems to be technically trivial to reduce a higher order ODE to a first order one. I hope the author could have provide a brief discussion in introduction on the technical difficulties in this paper. A more complete background in residual nets might be helpful as well.
In line 28 of Introduction, "To fill this void,....", this is where I want to see the necessity of filling this void and the potential impact of the results of this paper to this area.
It is pointed that "no general study of second order behaviour neural ODE...", but please address the meaning of this study for machine learning community and the technical novelty.
The paper is expected to benefit the general audience, not only the experts in specific narrow area. But with the current write up, I am not convinced that it has given an important result or introduced an interesting technique that might be used in other problems.

__ Correctness__: There is no obvious mistakes in proofs and calculations.

__ Clarity__: The paper is clearly structured with preliminary, theory and experiments, but more details on the non-triviality of the ODE technique and the importance of the results are required to be clarified in introduction.

__ Relation to Prior Work__: This is my main concern, the authors are expected to give a detailed comparison to the previous work by addressing: why the result matters and why the technique is difficult.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The paper proposes the second order neural ODE (SONODE) that can describe higher order dynamics and tackle the intersecting trajectory problem of NODE. It alsoinvestigates and its relationship to augmented neural ODE (ANODE) theoretically and empirically, and extends the adjoint method to SNONODE. It shows that SONODE has unique solution to the problem that ANODE has non-unique solutions. The method was tested on synthesized and real data with the comparison to ANODE.

__ Strengths__: The paper provides theoretical understanding of the relationship between SONODE and ANODE. SONODE tackles the issue of intersecting trajectory to first order ODEs and gives unique solution to certain problems that ANODE cannot.

__ Weaknesses__: The proposed SONODE can be seen as a special case of ANODE of which the augmented state is constrained to be the second order derivative (acceleration). Its functional form f^(a) takes functional form df^(v)/dt if I am correct. The can you talk about if SONODE can represent the function used in Dupont2019 as the example showing ANODE's advantage?
Moreover ANODE has a larger parameter space than SONODE. Then woulde ANODE be able to achieve the same performance with adequate amount of data?
The authors have addressed these concerns in the feedback. It is agreed that SONODE is a special case of ANODE.

__ Correctness__: The claims, method and the empirical methodology are correct to the best of my knowledge.

__ Clarity__: The paper is overall well written.
It's a little confusing that the variable 'a' stands for both acceleration and augmented variable.
Can you please explain more on how Eq 8. is derived? To me it is a jump from \dot{z} = [\dot{x}, \dot{a}] to what it is in Eq 8.
Thank the authors for clarifying the derivation.

__ Relation to Prior Work__: It is clearly discussed how this work differs from previous contributions

__ Reproducibility__: Yes

__ Additional Feedback__: The supplementary compares SONODE and NODE on classification problem. Can you please do the same comparison with ANODE.