Paper ID: | 7901 |
---|---|

Title: | Learning Positive Functions with Pseudo Mirror Descent |

There are various applications in ML (e.g. intensities in point processes) that learning a positive function is desired. Link function and Projection-based methods are among the popular approaches to this problem. This paper proposes a nonparametric efficient method with the theoretical guarantee to learn such functions. How to choose the proper Bregman divergence? In the eq(4), what is the role of the first term which does not depend on x? The paper is generally well written and the conditions are clearly mentioned. The impact of the work is justified by the authors. I generally liked the idea and recognized it as a good step towards constrained optimization. Update: After reading the rebuttal, I would like to keep my score and recommend this paper for acceptance.

Originality: the paper proposes a pseudo-mirror descent framework that can be applied to handle positive-value function learning problems such as Poisson processes and Hawkes processes. Related RKHS formulation of learning these processes exist in a batch-learning setting. Quality and clarity: The paper provides both theoretical justifications as well as real-world examples to demonstrate the utility of the proposed method. The presentation of the paper is also clear. Significance: In section 3.1, the intensity function is assumed to be continuous. The reviewer is concerned with whether such an assumption could be restrictive as in many scenarios modeled by point processes the intensity function can demonstrate sudden changes and hence is not continuous. Overall, the author demonstrates a nice way to carry out optimization in Hilbert space and offer solutions to interesting point process problems. The convergence results of the psudo mirror descent algorithm can also be of interest to non-convex optimization. =========== after feedback ========== The reviewer would like to thank the authors for their explanation. In terms of the concern in continuity, a nice addition would be some synthetic experiments where there are sudden changes in the intensity functions (e.g. how well the method can track a step function of intensity).

pros: 1) This paper is well-written and easy to follow. 2) Theoretical assumptions for analysis and definitions are very clear. 3) Concrete examples are given to guide readers to interpret the definitions. cons: 1) The implication of some important theoretical results (e.g., theorem 3 and corollary 4) are not well explained. 2) The experimental results are not well analyzed. It seems the proposed algorithm tend to oscillate after the first few iterations compared with the link function approach and projected gradient descent. It would be better that the authors could analyze this in detail to show some hidden properties of the new algorithm. 3) The experimental settings are a bit confusing. The reviewer usually chooses the same kernel across different experiments for convenience to justify the robustness of the proposed method. However, the authors used different kernels in synthetic and real experiments. It would be better to briefly explain the rationale behind.