Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper is very well written. The motivation is clear and the approach which involves finding curved axis for projecting the data is very intuitive. The authors provide theoretical analysis that significantly improve the quality of the paper. The experimental evaluation is also convincing. Overall, I believe this is a significant contribution and it should appear in NeurIPS. My main questions/comments are as follows: 1) Shouldn't there be a notion of orthonormality for the curved axis (thetas)? How do you enforce the axis from collapsing to directions with higher variation? 2) The discussion in page on derivation of the (smoothed) gradient is a bit unclear and hard to follow. Please try to simplify that. 3) Can you please comment on the runtime and the complexity of the algorithm? How does it scales to significantly large datasets? It would have been nice to have access to the code for having a better assessment of the method.
The paper is well written in general. My main concern is that, although the formulation of the proposed metric is new to my knowledge, the choice of baselines is not appropriate. The proposed model is a NONLINEAR generalization of the linear Mahalanobis. However, most of the baseline methods (except DDML and PML) are standard LINEAR models. This comparison is unfair since it is known that nonlinear approaches (e.g. nonlinear kernels or neural networks) outperform linear models in metric learning. DDML was published in CVPR 2014 and PML was published in CVPR 2015. They then do not correspond to state-of-the-art deep metric learning approaches. The authors could use more recent baselines such as those reported in [A] (angular loss etc...). The authors also do not explain how they trained the nonlinear baselines. Did the authors use the neural network architectures used in the original papers? How were the hyperparameters determined etc... [A] Zhai and Wu, Making Classification Competitive for Deep Metric Learning, 2018
Originality: The method is new and provides a direct generalization of the Linear Distance Metric learning. Quality: Theorems are clearly interesting to validate the methodology. Fitting capacity result (Theorem 2) ensures that there exists a curvilinear metric that can well separate the data. The Generalization bound ensures empirical loss converges to the expected loss. However, it is unclear whether this ensures that the algorithm converges to the/a Distance introduced by Theorem 2 (the distance well separating the data). It does not seem to ensure that the obtained Distance perfom well either. Also, a discussion on the purpose of tensor $A$ in Theorem 3 could be beneficial. Finally, the topological property certifies that the object learned is a (pseudo)metric; which is a useful property. Clarity: The paper is well written and well organised and pleasant to read. In particular, the "Geometric interpretation" paragraph is a good step to get the insights of the paper. However, it should be stated clearly that the authors focus on "supervised metric learning", aimed at learning metric for classification; the first 3 pages are quite misleading in that matter. Significance: The comparison with the state of the art results convince me that it is a method likely to be used in practice. Although the theoretical results ensures polynomial approximation of the metric could behave well enough for classification, it is not clear whether the algorithm converges to this solution after reading Theorem 3.