|
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The precision/recall results for these LSH schemes are missing. Does both methods find the same neighbors? Some discussion on P/R will be useful. There are no comparisons against data dependent methods, hence it is unclear if this method will be preferred over data-dependent hashing schemes. Discussion of proposed method with respect to data dependent methods will be handy for the applicability of proposed approach in other nearest neighbor applications.
Q2: Please summarize your review in 1-2 sentences
The paper presents a multiprobe LSH variant of Cross-polytope LSH. Also presents a lower bound for the quality of any LSH family for angular distances. The results show that proposed method is faster than vanilla LSH.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors propose a novel LSH family for angular distance which (a) matches the theoretical guarantees of Spherical LSH (i.e., an asymptotically optimal runtime exponent) while at the same time (unlike Spherical LSH) being practical in that they outperform Hyperplane LSH for the same task by up to an order of magnitude.
This result is achieved by the combination of techniques where (a) a data point is first hashed using a random rotation and then the after-image matched to the closest point y from {+/-e_i}, where e_i is the ith standard basis vector of \R^d, (b) the random rotation is performed as a sequence of 3 pseudo-random rotations (reducing
the asymototic overhead from O(d^2) to O(d log d), (c) feature hashing to reduce the effective dimensionality d, and (d) using a variant of multi-probe LSH for retrieval which generates additional probing locations.
The proposed work is novel, original and has the rare property that it combines theoretical optimality with a practical, realistic (C++ instead of MATLAB) implementation.
Q2: Please summarize your review in 1-2 sentences
The paper proposes an interesting extension on the existing state of the art in LSH for angular distance. It combines theoretical optimality with a practical implementation that is faster than the (practical) state of the art by an order of magnitude.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I think this is an impressive paper, very well-written (even if it is still difficult for a non-expert). One thing about the experiments is ambiguous:
Are all the reported results for multiprobe CP against multiprobe HP?
It seems it is the case from the "multiprobe experiments" paragraph of Section 6 but this should be stated more clearly.
Line 428: what is "input-sparsity time"?
line 176: "outline the proof" (no "of")
line 302, "hash" (!): has
line 342, "considers": consider
Q2: Please summarize your review in 1-2 sentences
A dense but well-written paper on an algorithm validated with several proofs and experiments for speeding up approximated nearest neighbor search on normalized vectors - which is a common problem in computer vision.
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors focus on both optimal and practical properties for LSH schemes which are hard to obtain. They show this difficulty by proving a lower bound for LSH on angular distance. The authors also propose the multiprobe LSH which is a variant of cross-polytope LSH to achieve the goal. The experimental results show that the proposed algorithm is faster than other LSH schemes.
Q2: Please summarize your review in 1-2 sentences
The idea looks sound. They have nice experimental results.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their valuable comments
and address them point-by-point below.
Reviewer_1: * We thank
the reviewer for the constructive comments.
* In our experiments,
we measured the average query time of nearest neighbor algorithms under
the condition that the success probability of finding the exact nearest
neighbor is at least 90%. This is a common setup in experimental
evaluations of nearest neighbor algorithms (see, e.g., reference [9]).
The precision / recall curves suggested by the reviewer are also a common
and useful way to evaluate hashing schemes. However, they do not take into
account certain important factors such as the time needed to compute the
hash function. Due to the space restrictions of the submission, we have
therefore decided not to include those plots and instead present results
for a wider collection of data sets.
Nevertheless, for the purpose
of this rebuttal, we are including several plots that depict the
relationship between the probability of finding the true nearest neighbor
(i.e., the "recall") and the length of the probing sequences for the HP
and CP hashing schemes (the length of the probing sequence corresponds to
the number of buckets examined). The plots are for the NYT data set and
for the random data set (n = 2^20) and were generated with the same LSH
parameters as described in our
paper:
https://storage.googleapis.com/nips2015_tmp/nyt_multiprobe_pr2_3.pdf https://storage.googleapis.com/nips2015_tmp/random_multiprobe_pr2_1.pdf
It
is worth noting that for multiple hash tables (L > 1), the lower left
corner of the plot (small success probabilities) is most
relevant:
https://storage.googleapis.com/nips2015_tmp/nyt_multiprobe_pr2_3_zoomed.pdf https://storage.googleapis.com/nips2015_tmp/random_multiprobe_pr2_1_zoomed.pdf
The
plots demonstrate that the CP hash indeed offers significantly better
tradeoffs than the HP hash.
* Yes, in the event of a successful
query, both methods find the same neighbors because we require them to
find the *exact* nearest neighbor.
* There is indeed is a large
body of work on data-dependent hashing methods such as spectral hashing.
In our submission, we focus instead on data-oblivious methods, which have
several desirable properties compared to the data-dependent
techniques: 1. It is easier to set up the data structure in a
distributed environment because points can be hashed without performing
any global optimization or even communication (see, e.g., reference
[9]). 2. Data-oblivious algorithms immediately support insertions and
deletions of points, without solving a new optimization problem. 3. In
a data-oblivious scheme like ours, the collision probability of two data
points depends only on the distance between these two points and not the
entire data set. This can lead to more predictable empirical performance
and provable guarantees.
For further discussion about the pros and
cons of the data-dependent methods, see "Frontiers in Massive Data
Analysis", National Academies Press, 2013, page 49.
We believe
these and other benefits justify the study of data-oblivious hashing
methods on its own.
Reviewer_2: * We would like to thank the
reviewer for the positive feedback.
* Yes, all comparisons between
the HP and CP hashing schemes (Tables 1 and 2) are for the multiprobe
variants of HP and CP. We mention this in lines 413 - 415 of the paper,
but we will also make this point more clear in the final version.
*
Line 428: Here, "input-sparsity time" refers to a time complexity that is
linear in the number of non-zero elements in a vector (as opposed to,
e.g., a linear dependence on the ambient dimension).
* Thank you
for pointing out the typos in lines 176, 302, and 342, which we will fix
in the final version of the paper.
Reviewer_4: * We thank
the reviewer for the positive comments, in particular the last sentence
regarding theoretical optimality and practical experiments - achieving
this combination was indeed the main goal for this
paper.
Reviewer_5: Thank you for the comment. We
respectfully disagree with the remark that the topic of our paper is not a
good fit for NIPS. Over the past few years, several papers about LSH and
similar hashing schemes have appeared at NIPS. Moreover, these papers
often considered the running time required for nearest neighbor searches.
This list of papers includes, but is not limited to, the following
works:
NIPS 2014: Shrivastava and Li (received the best paper
award); NIPS 2013: Neyshabur, Yadollahpour, Makarychev, Salakhutdinov,
and Srebro NIPS 2012: Ji, Li, Yan, Zhang, and Tian NIPS 2012: Gong,
Kumar, Verma, and Lazebnik NIPS 2012: Kong and Li NIPS 2011: Li,
Shrivastava, Moore, and Konig
Reviewer_6: We thank the
reviewer for the positive comments. |
|