Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper provides a novel and interesting solution to the problem of numerical instability in ML based methods on hyperbolic manifolds using a group based tiling approach. The work is well supported by experiments and the mathematical statements are backed up by extensive proofs in the supplementary material. The work is generally very clear considering the complexity of the underlying approach. In places, however, the writing seems rushed and would benefit from the authors reviewing the language. I have highlighted several typos in the improvements, but there are likely more. There are also some inconsistencies in the bibliography. The major limitation is that it does not apply to dimensions greater than 6, which are required by most existing work in this field. The authors mitigate this problem in Section 6, so that it is my belief that the work is still significant, though less so than were the group based tilings to be readily applicable to many existing models.
The paper touches an issue that is very important and most likely the reason why hyperbolic embeddings have not been adopted widely. From my experience, hyperbolic embeddings sometimes have catastrophic results compared with competing methods. This is because of numerical instabilities. The paper is very well written with a lot of theoretical and empirical results. The solutions the authors provide is theoretically proven and very well documented. The experiments are also sufficient and realistic and they prove the point. The only problem of this paper is that it has too much information that most of the NeurIPS audience would not be able to follow, as it is not familiar with a lot of interesting mathematical terms. I spent significant time to go through the references. Given the complexity of the solution, the authors have done a good job explaining some mathematical concepts, but they definitely needed more space. The supplementary material is 16 pages! line 45: A schematic would explain the idea better reference  : “In In Flavors of geometry “ repeated “In” line 71:” Since for hierarchical data such as a tree with branching factor b, the number of leaf nodes increases exponentially as the number of levels increases”. I have seen this explanation in several papers but I think it is not correct. Also, this argument is fundamentaly wrong as it compares an infinite space with a finite space. The number of the leafs of a hierarchical space grows exponentially but remains finite. The area of a disk contains infinite points. Technically speaking the unit circle can fit the universe. The reason why euclidean space is different and it is described very well on this paper: As described in this paper https://papers.nips.cc/paper/5971-space-time-local-embeddings.pdf “...The maximum number of points which can share a common nearest neighbor is limited 2 for 1-dimensional spaces, 5 for 2-dimensional spaces and so on.While such centralized structures do exist in real data d-dimensional spaces can at most embed (d + 1) points with uniform pair-wise similarities.” Also the triangular inequality imposes more constraints. See the references: K. Zeger and A. Gersho. How many points in Euclidean space can have a common nearest neighbor? In International Symposium on Information Theory, page 109, 1994. L. van der Maaten and G. E. Hinton. Visualizing non-metric similarities in multiple maps. Machine Learning, 87(1):33–55, 2012. I suggest reading this blog https://networkscience.wordpress.com/2011/08/09/dating-sites-and-the-split-complex-numbers/ and also this paper: https://dl.acm.org/citation.cfm?id=2365942 line 86: “This suggests that the numerical model used for learning an embedding can have significant impact on its performance.” To me that doesn’t come as a logical consequence. The reference talks about the importance of curvature, flat, negative, positive. At least that is the main point. The title doesn’t address the numerical issues. line 91:” However, those models suffer from the precision and accuracy problem” We need a little bit of clarification here. Where the models compared with other embeddings and didn’t prove to be competitive in performance? Do the authors believe that the reason was numerical instability? If yes it would be nice to repeat these experiments with their solution and prove the point. line 131: The NeurIPS audience is not very familiar with Fuchsian groups. I had to do a lot of reading to understand how the tiling is done. That paragraph is too condensed line 146:”plane as a pair consists of “ change to “plane as a pair that consists of” line 162: “Importantly, any element of G can be represented in the form “ Is this a fact for all Fuchsian matrices? line 248:”Construct tilings from a set of isometries that is not a group.” This is an interesting idea. Can we have a reference for that? An example of isometries that are not groups would be useful here.
Originality: the method is very novel and creative, some of the best ML papers I recently read. Previous work is mostly well cited, with some exceptions of recent hyperbolic embeddings papers from recent NeurIPS/ ICLR / ICML conferences. Quality: As said above, the proposed method is usually technically sound and complete, e.g. section 4 and attached appendices, but has a few weaknesses or incomplete discussions in sections 5 and 6 that I will briefly touch below. The authors are typically honest about their work, with the small exception of Algorithm 2 (RSGD) which works only for restricted objectives that depend only on the hyperbolic distance function. My current grade is conditioned on clarifying this issue in the main text. Clarity: paper is clearly written in general. However, sections 5 and 6 and their appendices should be explained more extensively (I know the page limit is a problem, but at least the appendices should be clear). The experimental section is too small for an easy reproducibility of this method, so I am asking the authors to both publish the full code to reproduce all the reported results and to do a better job at explaining the training details (and hyperparameters). Significance: as said above, this is an important contribution to improve hyperbolic embeddings and reduce their inherent numerical errors. It would be hard for someone else to redo this important piece of research and its valuable theory. However, I believe there are still a few points that could be improved both in the write-up and method that would make the paper stronger.