Export Reviews, Discussions, Author Feedback and Meta-Reviews

Paper ID:	1566
Title:	Fast, Provable Algorithms for Isotonic Regression in all L_p-norms

Current Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

From my understanding, the paper makes a good technical contribution, unifying a large body of work on isotonic regression (IR). The basic idea seems intuitive, and is to employ techniques from the fast solvers of linear systems. Thus, from the perspective of novelty and technical content, I cannot raise any issues (based on my limited understanding -- regrettably, I do not have the background to check the proofs).

But my concern with the paper is simply that it may be better suited to a algorithms/theoretical CS conference or journal, such as those where the work it improves upon ([16] -- [20]), and the work it employs in developing the algorithm ([21] -- [29]) were published. It is unclear to me whether the results in the paper would be of sufficient interest to the broader NIPS community. In particular:

- while IR has seen some interesting applications to learning problems of late, it is not (in my estimation) a core ML tool for which a faster algorithm is by itself of wide interest. I feel there has to be some additional learning-specific insight or extension for an ML paper. I would contrast this to one of the contributions of [12], which was the design of a faster algorithm for Lipschitz IR fits. Here the Lipschitz problem arose from statistical motivations, and solving it over vanilla IR was shown to have an impact on what could be guaranteed statistically.

- in the application of IR that I am most familiar with, namely probabilistic calibration (the references [0] and [-1], which could be added) and learning SIMs ([10, 12]), from my understanding the proposed algorithms do not bring faster runtimes, as in these cases one operates over very structured DAGs. Of course faster runtimes for general DAGs are of considerable algorithmic interest, but again I reiterate that in my estimation, more direct impact to an ML problem is needed. It may be the case that there are other interesting learning applications where the proposed algorithms represent a significant advance. If so, this should be spelt out much more clearly.

Other comments: - There is some work on establishing that the PAV algorithm is optimal for a general class of loss functions ([-2], and references therein). It may be worth citing.

- From my preliminary reading, it seems that [14] works with the standard L2 norm, not a general Lp norm?

- pg 6, consider making the four points about Program (5) into bullets.

Typos:

- pg 1, "IF it is a weakly" - pg 4, "ACCOMPANYING" - pg 6, "show that the $D$ factor" - pg 7, "Regression on a DAG"

References:

[0] Bianca Zadrozny and Charles Elkan. 2002. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02). ACM, New York, NY, USA, 694-699.

[-1] Harikrishna Narasimhan and Shivani Agarwal. On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation. In NIPS 2013.

[-2] Niko Brummer and Johan du Preez. The PAV Algorithm optimizes binary proper scoring rules.