Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Post-response comments (from discussion): "I feel the response did a good job of answering points of confusion, and also added an interesting example application of color transfer. This last example/experiment is heartening, because they finally use one of their maps (MK) in an application instead of just using the distances. I would be quite interested to see a more comprehensive exploration of this, as well as further applications of the MI and MK maps (perhaps in domain adaptation or Waddington-OT, which they mention elsewhere?). It's also important to note that they included a new algorithm for subspace selection which performs projected gradient descent on the basis vectors of the subspace, which outperforms their old method in the synthetic cases. This is a nice discovery, but I think will necessitate more than a minor structural change to the paper. One would expect a more complete presentation of the algorithm, including discussion of convergence (to local optima at least?) and runtime. It would also be nice to find a non-synthetic use case for this subspace selection, if possible. In light of their response, I feel that this paper is on the right track, but could use another iteration to better argue for applicability of their maps, and to update their algorithm for subspace selection." ===================================================== This paper presents an interesting fundamental concept that naturally arises when one considers subspace-restricted transport problems. I was not able to check line-by-line all the mathematics, but I feel it performs a careful basic exploration of these concepts. However, as hinted at above, the main weakness of the paper is its applied facet. It presents a subspace selection algorithm without any real analysis or experimental exploration. The applications presented seem fairly esoteric and don't seem to really demonstrate the utility of the full transports obtained (they explore mainly the distances obtained).
Comments: 1: A minor critique: is E introduced before line 44? It is clear from later reading and after studying the context that E denotes a particular subspace, but the notation section follows after this. 2: Line 337, is this counter-example demonstrating something other than the analogy that performing PCA with a (random) subset other than the first k components leads to sub-optimal outcomes? 3: Line 333: what does ‘underestimated’ mean in this context? Moreover, in this setting, does the number of samples used to estimate the covariance matrices (n) equal the dimensionality (p) (i.e., does p = n)? If so, would one expect the covariance matrices estimated to be reliable or of any decent quality? For example, in the low-rank-plus-noise setting, one can end up with estimated principal components that are asymptotically orthogonal to the true components when p/n does not go to 0. See (for example) Johnstone and Lu, 2009 4: Continuing from (3), more generally, what is the value of d_2 in the simulations shown in Figure 4? Have you investigated the role of d_2 in this context?
The lifting technique involves the disintegration of measure. The whole construction of MI and MK is quite simple, so we can say that it is a technically simple theoretical paper. Quality of that approach is strongly dependant on how we choose E, so one chapter is dedicated to this issue. I could not find experimental verification of that chapter's suggestions, especially of Algorithm 1? Experiments with synthetic data seems informative, but semantic mediation etc are not convincing.