Paper ID: | 347 |
---|---|
Title: | Optimizing affinity-based binary hashing using auxiliary coordinates |
The general problem of binary hashing is: given a metric/similarity/affinity, find the best hash function mapping the original objects into Hamming space of fixed dimension, while preserving the distances/affinity, etc. There have been many solutions proposed here, many based on solving some hard optimization problem. Most importantly, as the authors claim, usually such solutions follow a 2-step approach: 1) find the best representation of the given objects in the Hamming space, 2) given this representation (basically n binary vectors), find the best hash function to map the objects into these vectors. This paper proposes to use the "method of auxiliary coordinates" (which is a form of alternating minimization) to combine the 2 steps. At an intuitive level, the algorithm would iterate steps (1) and (2) a few times (note that this changes the problem (1), as we now also care about the hash function).
Overall, this looks like a solid idea: it is natural that we would want to combine (1) and (2) in a single optimization problem, and a form of alternating minimization would naturally help. I'm a little concerned to what degree this differentiates from the previous work. There has been work who solve their concrete optimization in this way; see, e.g. the paper on "Circulant Binary Embedding" by Yu, Kumar, Gong, Chang (though of course, exact details are quite different). Perhaps the main contribution of the paper is generalizing such methods, and thus give a single generic approach to be applied for other variants of binary hashing optimizations.
2-Confident (read it all; understood it all reasonably well)
The paper proposes a machine learning framework for learning hash functions with affinity-based cost functions.
The paper looks in a good shape, with all the issues correctly addresses. Anyway, unfortunately I do not know the subject well enough to provide feedbacks and suggestions, or to detect mistakes.
1-Less confident (might not have understood significant parts)
The authors propose a solution to the data-sensitive locality preserving hashing problem for retrieval applications, in the standard framework. y_{nm} is the ground truth similarity between items n, m, and x_n, x_m, their intrinsic representations. The authors need to fit a hash function h such that h(x_n) and h_(x_m) are similar or dissimilar as dictated by y_{nm}. This means the loss is written as L(h(x_n), h(x_m), y_{nm}), i.e., h nests inside L. The authors approach this optimization by declaring auxiliary (binary) variables z_n which are regularized toward h(x_n) gradually using a suitable term in the objective.
The authors propose a solution to the data-sensitive locality preserving hashing problem for retrieval applications, in the standard framework. y_{nm} is the ground truth similarity between items n, m, and x_n, x_m, their intrinsic representations. The authors need to fit a hash function h such that h(x_n) and h_(x_m) are similar or dissimilar as dictated by y_{nm}. This means the loss is written as L(h(x_n), h(x_m), y_{nm}), i.e., h nests inside L. The authors approach this optimization by declaring auxiliary (binary) variables z_n which are regularized toward h(x_n) gradually using a suitable term in the objective. Procedurally, it is an unsurprising alternating optimization: hold the hash functions fixed and optimize the z_n's, then hold the z_n's fixed and refit the hash functions. But the authors claim (I haven't checked the proofs) some interesting theoretical results about the trajectory of solutions with the regularization. And they also present some competent experiments (with nice post-mortem analysis), although the experimental outlay is somewhat modest. A million images is fine (I would be ok with even 100,000 images), but what would be more interesting is how the system performs under more heterogenous clustering/similarity situations, as in real image search hosted by commercial search companies. (L270) Why did you choose 1:5 positive:negative neighbors? How did you choose other hyper/nuisance parameters like kernel SVM regularization etc.? The work is clearly competent, but I am not in a position to judge with high confidence how impactful it is in the community of recent work. The paper was a delight to read; lamentably few papers are written so clearly nowadays. It was easy to understand by non-experts, was not overly loud with claims --- a pleasure!
2-Confident (read it all; understood it all reasonably well)
The manuscript proposes a two-step supervised hashing approach for joint learning of binary codes and hash functions in a single optimization framework. Considering the recent advances in the hashing area, the paper has low novelty and lacks comparison with the state-of-the-art supervised hashing methods.
The main contribution of the paper seems to be the proposal of the two-step iterative joint learning of binary codes and hash functions. I would quite concern about the technical novelty of this proposed work since there have been recently published works that use similar alternating optimization [1,3]. - The proposed method needs comparison with the state-of-the-art supervised hashing methods [1,2]. In addition, computation time should be provided in comparison with the other hashing method so that the reader can judge on the practical relevance of the approach. - The authors refer the proposed approach as "an iterated, corrected version of the two-step approach". The usage of "corrected version" implies that the previous approaches are wrong while the paper itself claims that those produce suboptimal results only. Therefore, the usage of "improved version" or a similar phrase would be more appropriate. References: [1] Shen et al., "Supervised Discrete Hashing", CVPR, 2015. [2] Liong et al., "Deep Hashing for Compact Binary Codes Learning", CVPR, 2015. [3] Jiang et al., "Joint Kernel-Based Supervised Hashing for Scalable Histopathological Image Analysis", MICCAI, 2015.
2-Confident (read it all; understood it all reasonably well)
The paper focuses on binary hashing and some limitations, before suggesting a method (affinity based binary hashing using auxillary coordinates). The paper then describes the procedure, and demonstrates and compares the results via several experiments.
1. There is a very good review of related work and background, making this easy to follow. 2. Notation could be touched up a bit to make it clearer (but nothing fatal - just need a second glance to understand the paper). For example, had to re-read line 149 twice (bold font z_n is a vector), when compared to line 192 (z_{ni}). I think mu first appears in line 156, could give an explanation of what this is (hyper parameter that varies, same idea as lambda in (2))? 3. Legends in Figure 2 (most right hand plot) / Figure 3 (first and third plot) could be touched up / looks too squashed. Not sure whether it would be better to put plots in supplementary material, and just show one to two comparisons with the baseline. 4. It would be nice to see a plot of the comparisons of the time taken - perhaps in the supplementary material? 5. Overall, a decent paper though a bit dense
2-Confident (read it all; understood it all reasonably well)