__ Summary and Contributions__: This paper proposes Riemannian continuous normalizing flows (CNFs), which is more adequate for non-Euclidean data, such as directional data lying on hyperspheres, than conventional CNFs. The key idea of the proposed method is to take into account the underlying data geometry both when specifying and solving the associated ordinary differential equation, to ensure the faithfulness of the flow to the data manifold. For simplicity, the authors focus on constant curvature manifolds and evaluate their method on data living on spheres and hyperbolic spaces.

__ Strengths__: Developing probabilistic methods for modeling data living in non-Euclidean spaces is an important research topic, as there are many real-world applications where such data can be encountered. This paper makes an interesting contribution in this direction in the context of continuous normalizing flows.
The proposed method is principled and theoretically sound. Numerical experiments, even if they a bit proof-of-concept, illustrate the benefit of the proposed Riemannian CNFs for modeling data lying on spheres and hyperbolic space.

__ Weaknesses__: The limitations (e.g., scalability) of the proposed methods are not discussed.
The experiments on synthetic datasets involve relatively simple (unimodal) target distributions. It would be interesting to consider more complicated distributions as this is the setting in which it makes more sense to rely on normalizing flows.
Comparisons can be improved by considering stronger baselines (e.g., [1]).
[1] Rezende, Danilo Jimenez, et al. "Normalizing flows on tori and spheres." arXiv preprint arXiv:2002.02428 (2020).

__ Correctness__: Yes to the extent that I checked.

__ Clarity__: Yes.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: * Update after rebuttal
After reading the authorsâ€™ response and other reviews, my original impressions remain the same. I encourage the authors to take into account the reviewers feedback and improve the experiments as promised in their rebuttal.

__ Summary and Contributions__: The authors extend continuous normalizing flows to the manifold-valued setting. They achieve this by 1. deriving a change of variable formula that holds in the manifold setting (as the commonly used one does not), 2. slightly modifying the vector field neural network's architecture to ensure that manifold constraints are satisfied (they do this using pre-existing methods), and 3. using a different ODE solver which ensures that the manifold constraints are once again satisfied (which is also a pre-existing ODE solver).

__ Strengths__: I enjoyed reading the paper, and believe it has several points in its favor. From a theoretical perspective, I find the derivation of the change of variable formula in manifolds to be a valuable contribution to the machine learning community, and think it could encourage future work towards learning the manifold.
Additionally, while limited to low-dimensional data, I found the experiments to support the conclusion that this method works better than other normalizing flow alternatives and think it could become the standard method for sphere or torus-valued data.

__ Weaknesses__: In my view, the main disadvantage of the paper is that its scope is somewhat narrow: when used to directly model observations, the author's method can only be applied when the manifold is known beforehand. High dimensional data of interest in machine learning will often lie in a low dimensional manifold (e.g. natural images), but the manifold is typically not known in advance. That being said, the problem of learning the manifold is a much harder one, and this complaint is a bit of a nitpick on my part, as I think the change of variable formula that the authors derive could be a stepping stone in this direction.
Another complaint is that, while the authors convincingly show that their method performs better than other normalizing flows on spherical data, the data is essentially 2-dimensional, so it is not clear to me that a neural network model will necessarily be the best performing alternative. For example, did you consider comparing against a mixture of vMF distributions?
Finally, I think the absence of experiments in a VAE setting are a shortcoming of the paper. There is a recent line of work endowing the latent space of VAEs with a manifold structure, e.g. [1, 2, 3]. While these papers are cited, comparing against them in a VAE setting would significantly strengthen the experiments section of the paper, as it would show empirically that the proposed method is not only useful to model manifold-valued data. Could you include such additional experiments?
[1] Nagano et al., A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning, ICML 2019
[2] Mathieu et al., Continuous Hierarchical Representations with PoincarĂ© Variational Auto-Encoders, NeurIPS 2019
[3] Bose et al., Latent Variable Modelling with Hyperbolic Normalizing Flows, ICML 2020

__ Correctness__: I did not find any errors in the claims or empirical methodology followed by the authors.

__ Clarity__: The paper is excellently written. My only suggestion is to better reference the discussion in appendix B within the main manuscript. One should not compare manifold-valued densities against R^D valued ones as they are not Radon-Nikodym derivatives with respect to the same base measure. I believe non-careful readers might be tempted to make such comparisons, and these comparisons should be more clearly warned against in the main manuscript.

__ Relation to Prior Work__: Prior work is adequately discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: How much of a computational constraint is having to project the outputs of the network to the tangent space, along with having to use a projected ODE solver? Could you provide a wall-clock time comparison between your method and the competing alternatives?
===============================================================
UPDATE AFTER REBUTTAL
===============================================================
After reading the author's response, my assessment of the paper remains the same.

__ Summary and Contributions__: The paper introduces a Continuous Normalizing Flow which is aware of the (Riemannian) manifold structure it is defined on. Their key contribution is to express the instantaneous change of variables using a Riemannian metric. They leverage previous developments to efficiently compute it and evaluate the flow. Since several phenomena are naturally represented by data on Riemannian manifolds such as spheres, torii, and hyperbolic spaces, this is a valuable contribution to density estimation on those domains.

__ Strengths__: The paper has a solid theoretical development, providing detailed derivations for key steps, with adequate justification. I also appreciate how general the formulation is. In contrast to previous approaches, it does not rely on specific transformations designed for the manifold at hand. I see the approach as an important and relatively novel contribution to density estimation on Riemannian manifolds.

__ Weaknesses__: In my opinion, the experiments are the weakest aspect of the paper. First, the experiment showing limitations of the stereographic projection model seem unnecessary to me, as the stereographic projection is known to lead to problems in its singular point. Such singular points can even be changed if needed by the application, resulting in slightly different transformations. Even so, a lot of space is dedicated to show this and that the proposed method is not affected by it. It strikes me as a missed opportunity for more experiments on the applicability of the proposed method on real data.
Furthermore, experiments concerning real data seemed not very informative of the model's applicability, since: (i) all data sets are of similar nature, (ii) small, and (iii) their distributions do not seem challenging to model, as indicated by the performance attained by the baseline.
I would be eager to increase my score for the paper if an additional experiment is performed on a higher-dimensional problem. As an added benefit, this could also be an opportunity to showcase advantages with regards to the mentioned numerical instability of prior approaches. Possible other domains that could be explored are even mentioned in the paper itself:
* "The shape of proteins can be parametrized using torii (Hamelryck et al., 2006)"
* "Cell developmental processes can be described through paths in hyperbolic space (Klimovskaia et al., 2019)"
* "Human actions can be recognized in video using matrix manifolds (Lui, 2012)"
One can also artificially devise a complex high-dimensional data set on a manifold by projecting high-dimensional data from Euclidean space onto a Riemannian manifold and then attempting to model the resulting distribution.

__ Correctness__: The claims and method are correct. Empirical methodology needs some improvement, as mentioned previously.

__ Clarity__: The paper is very well-written, with an easy to follow argumentation, justification, and development. One problem is that the authors constantly point towards supplementary material for any depth in the discussion. However, I understand the difficulty of addressing this issue with the limited space available.

__ Relation to Prior Work__: Very good discussion of prior work in normalizing flows.

__ Reproducibility__: Yes

__ Additional Feedback__: # After reading the author response:
As the authors commit themselves to do an additional experiment on higher-dimensional data, I no longer see the experiments as a major issue to be addressed. Hence, I believe this paper could be accepted in this case.

__ Summary and Contributions__: The paper proposes a normalizing flow method for Riemannian manifolds, which ensures that the flow is always located on the manifold for any starting sampled particles. The flow function is defined via vector fields on manifolds and approximated by the associated ODE.

__ Strengths__: The idea in this paper have shown great novelty, the claims in this paper are rigorously proved in the view of manifolds. The experiments part have shown good results, figures for the tests on the earth sciences data are beautiful.

__ Weaknesses__: All the experiments in this paper are tested on regular manifolds (the earth's surface is approximated as a perfect sphere), this part would be strenghtened if more experiments on complicated or high-genus manifolds are provided.

__ Correctness__: The claims in this paper are correct, and the apsects of continuous normalizing flows on Riemannian manifolds are defined clearly and proved in detail.

__ Clarity__: The paper is well-written and readable.One small flaw is some parts in section 1 and 2 are repetitive.

__ Relation to Prior Work__: This work extended the traditional normalizing flow to general manifolds, and have better performance than some of the previous works, for example, stereographic mathods.
Besides flow method, there are other method, such as optimal transport mapping method, to transform one distribution to the other. The authors neglected this direction.

__ Reproducibility__: Yes

__ Additional Feedback__: Flow method is commonly used in the literature of optimal transport. One of the major drawbacks of this method is that it can not model the transformation for a white noise to multi-mode distributions, because in this situation, the mapping is not continuously globally. This will cause mode collapse in DL systems. The authors need to address this problem.
[POST REBUTTAL]
I thank the authors for the detailed response. In fact, even if the target measure support is simply connected, the transport map may not be continuous either. In general, I think the work is promising and remain the same score.