NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
This paper presents a novel approach for computing a distance between (unaligned) graphs using the Wasserstein distance between signals (or, more specifically, random Gaussian vectors) on the graphs. The graph alignment problem is then solved through the minimization of the Wasserstein distance (which has an explicit formulation in this case) over the set of permutation matrices, encoded using the Sinkhorn operator. The paper is easy to read and very interesting. The ideas proposed for this challenging problem (the computation of graph distances and graph alignment) are very elegant and, based on the experimental section, efficient in practice. The use of the Wasserstein distance between graph signals for computing graph distances appears novel, and the work contains many interesting ideas that would be valuable for a researcher attending the conference and working on a similar topic. My two main concerns are: 1) Weak experimental section: Several technical aspects of the algorithm are missing. For example, the choice of parameter \tau, number of samples S and number of iterations of the Sinkhorn algorithm. Are you using automatic differentiation to perform the stochastic gradient descent? How robust is the optimization algorithm to different initialization? In practice, do you run the optimization multiple times to avoid local minimums? The computation time and complexity of the algorithm would also make a valuable addition to this work, as it is hard to evaluate the speed of such a computation based on the text, and the algorithm seems intuitively rather slow due to the many iterations of the Sinkhorn operator required to converge to the permutation matrix. Is this number of iterations large in practice? 2) The distance is only applicable for graphs of the same size. This should be clearly stated in the abstract, as many graph datasets contain graphs of multiple sizes, making the approach impossible to apply. Is the method easily extendable to graphs of different sizes? Minor comments and typos: 1) In the abstract, the authors mention that the distance can capture the "global structure of graphs", although this claim is never proved nor motivated. Can you elaborate on this aspect? 2) The notation N(0,x) is ambiguous, as x may represent the variance or the standard deviation. Can you clarify this before or after eq.(3)? 3) l.15: "the-state" -> "the state" 4) l.53: "a NP-hard" -> "an NP-hard" 5) footnote 2: "ensures" -> "ensure"
Reviewer 2
In this paper the authors tackle the graph alignment problem (and other related problems) with tools from the optimal transport area. Both the problem and the tools are of great significance. The paper is very well written, it's very pedagogical and the description and examples (Figs 1,2,3) are very illustrative. It's definitively a very nice paper to read. In the introduction, the first paragraph reviewing graph alignment methods may be complemented with very important works on this direction. For instance: - On convex relaxation of graph isomorphism, PNAS, Aflalo Y., Bronstein A., Kimmel R. - A Path Following Algorithm for the Graph Matching Problem, PAMI, Zaslavskiy, M., Bach, F. & Vert, J. - On spectral properties for graph matching and graph isomorphism problems, IMAIAI, Fiori M., Sapiro G. - DS++: A flexible, scalable and provably tight relaxation for matching problems, Dym N., Maron H., Lipman Y. This is a minor point: I understand that the use of the Sinkhorn operator it's convenient in the formulation of the optimization problem due to its differentiability. However, after getting the matrix \bar{P}, which in general is not a permutation matrix, other methods may be used to obtain a permutation matrix. From the projection to the permutation set, which is a LAP, to more complex methods like some in the cited papers above. I'm not sure if this could improve the results. In particular, in the example of Fig. 2, I'm sure it would give the same (correct) answer. For directed graphs, the problem is the use of the Laplacian (and its pseudoinvesere)? I guess that the optimization requires the pseudo-inverse of the Laplacians. How does this scale with the size of the graph? And how is the convergence of the algorithm? (both in speed and when does it fail) I would have liked to see an experiment analyzing the performance of graph alignment but for inexact graph matching. There's also a small error in Figure 1 (b) and (c). According to the numbers presented, the norm there is the Frobenius norm, and not the 2 norm.
Reviewer 3
The work is original (as far as I know). I had never seen the use of OT that way to propose a distance between graphs written as eq. (6) and then (9) when a permutation is involved; despite the fact that the result is a direct application of known results) and significant. The whole manuscript shows that this way of writing a distance between graphs, that gives also a transportation plan from one graph to another, can use in practical situations for graph alignement, a known and important problem. I don't have much comments on the work, as I think it is a very good and that it is already very clear. Some improvements could always be made, as described in a next Section. The authors' feedback after reviews is very good also, conforting my very good impression about this work.