Paper ID: | 1294 |
---|---|

Title: | GENO -- GENeric Optimization for Classical Machine Learning |

The paper presents a framework that can be used to optimise general machine learning problems. The framework comprises a modelling language in which to describe the problem, and an automated way of generating a numerical solver for that particular formulation. The work takes advantage of recent advances in automatic differentiation to address a broad range of problems. The paper is well-motivated, and contains impressive results. I will address the four criteria separately: Quality: The framework presented is tested on a number of different problems and against a variety of solvers, which lends substance to the authors' claim of its performance. The work is sound as far as I could follow it (although I am not familiar with many of the methods invoked), and I found particularly elegant the way that different classes of problems are converted to the base case of smooth and unconstrained optimisation. Clarity: The paper is generally clear, although a bit light on details and too brief in some places (understandable due to the length restriction). More specifically, I had trouble following the description of the solution for unconstrained, smooth problems (lines 117-132). I think that some pseudocode or equations would have been more explanatory than the long paragraph. On the other hand, the explanations for why particular methods were chosen over others in Section 2.2 were more enlightening (although, as said above, I am unfamiliar with many of these). It would have also been useful to know a bit more about the workflow of using GENO (what happens to the generated file? how is the code called?) and, similarly, which part of setting up/generating/running the code is timed in the results (for both GENO and the compared solutions). A few more clarity-related comments and suggestions are given at the end of the review. Significance: Frameworks and implementations are important for giving users access to existing methods, and are often underrated. The approach described here offers both a simple-yet-versatile modelling language, and (according to the presented results) an automated way to generate extremely performant optimisers. The combination could prove very powerful in extending the reach of machine learning methods to various domains. Originality: The authors make clear that the particular algorithms used in this approach are not novel, but that does not detract from the importance of the submission, as its main point is the automated application of these algorithms to treat a particular problem. This is indeed a very important contribution. However, it seems to me that the authors are neglecting a class of related approaches, that of probabilistic programming languages. These are similar to the proposed framework in allowing flexibility in the specification of the problem (more so than GENO?) while automatically generating a solver "under the hood", and are also an active area of research and development. The existence of languages/frameworks like Stan, pyMC3 and Infer.NET (citations below) seems to me to slightly diminish the novelty of the work, although I would be interested in whether/how the authors consider this to relate to GENO. Other comments: - l. 37-39: I am not clear on the distinction between "well-engineered" and "state-of-the-art" solvers, particularly as to what category the solvers in the example results fall under. It was also a bit confusing to see three categories of solutions (the two types of solvers, plus the modelling language + solver) and directly afterwards switch to toolbox vs modelling language - I tried for a while to align the two categories, but failed. - l. 52: "The transformed problems" -> "problem" - l. 75-76: "GENO does not transform problem instances but whole problem classes": I still find this sentence a bit vague; does it refer to the transformation of eg constrained to unconstrained problems? Or to the generation of a solver from the general problem formulation, without needing specific data? - Table 1: I am not sure what "deployable/stand-alone" means in this context. - l. 120: A citation or explanation for the projected gradient path approach would be useful. - l. 136-137: "All non-smooth... f_i(x)": Is this fact a constraint/limitation of the language, or a more general property for a class of problems? - l. 160-170: Why \rho is multiplied by 2 in this case was not clear to me - is this an arbitrary choice? (that said, once again I am not an expert!) - l. 209: Delete "respectively". - Figure 1: missing legends - l. 229-230: I would move this introduction to the previous section (probably adapted so that it refers to l-1 or general regularisation) - l. 241: "GENO converges as rapidly": I would add "almost", as in the top 3 examples it is slightly slower than Liblinear (although still much faster than the other algorithms) - l. 243/Table 2: Why is this comparison against CVXPY? Is there something particular about it compared to the other solvers? -l. 271: "We have presented GENO the first...": add a comma after "GENO" Links (websites also indicate preferred citations): pyMC3: https://docs.pymc.io/ Stan: https://mc-stan.org/ Infer.NET: https://dotnet.github.io/infer/ UPDATE AFTER REPLY: The authors have acknowledged my main points (relation to probabilistic programming languages, timing details) in their reply and have said that they will address them by providing more information in the next draft. It was also very positive to see that the authors plan to release the code as open-source. Without more details about the content of the changes (e.g. the comparison with PPL, more detailed explanation of how other solvers scale), I do not feel I can increase my score, but I stand by my initial positive assessment and think the paper is certainly worthy of publication.

The authors present GENO, a generic optimization framework for solving classical machine learning problems, that can be expressed as vectorized linear algebra expressions. The generic optimizer is based on quasi-Newton optimization, but can solve also constrained, non-convex and non-differentiable problems due to use of automatic differentiation, as well as a number of problem transformation methods. The paper is clearly written, and easy to follow. Related work has been covered quite thoroughly. The experimental comparison presented in the paper and supplementary materials is quite comprehensive with regard to number of compared methods, data sets and optimization problems considered. I found it quite surprising that a generic method can perform so well on such a diverse range of tasks, and I believe that the method will be of interest to the NIPS community. Questions to the authors: Since the main contribution of the paper is the solver software, what is the planned availability of this software - will it be made freely available under some open source license? Can the solver accommodate for special structure in the parameter matrices (e.g. matrix A is sparse and mostly full of zeroes, product of two "thin" matrices", or A is a Kronecker product of two matrices)? In many scipy methods for example one can define a linear operator, that returns a matrix vector product rather than having to create a matrix explicitly. Figure 1: the image caption could be a bit more informative. I'm quite familiar with the regularization path algorithm and have seen such plots before, but I still had to check the scikit-learn demo out to remind myself what exactly the axes and curves correspond to in the image.

Originality: The concrete approach seems to be rather novel; according to the authors, it is based on recent progress in automatic differentiation for matrix calculus [36]. Mathematical modelling languages, of course, have existed for a long time. GENO seems to be more convenient than e.g. CVXPY (problem dimensions need not be specified), and also more widely applicable (the problem need not be convex, in which case a local optimum is simply determined). Quality: The approach seems technically sound. The manuscript describes some of the theory developed to support the approach. Experiments confirm that the approach works in practice. Clarity: The manuscript is rather well-written and easy-to-understand. As an exception, Figure 1, as well as the accompanying description, left me clueless without visiting https://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic_path.html, where the diagram is described in more detail. I also have a concern that's unrelated to the manuscript itself, but still concerns clarity: The authors indicate in the reproducibility response that their submission includes "A link to a downloadable source code, with specification of all dependencies, including external libraries.". However, the manuscript only provides a link to a website that allows one to *invoke* GENO, that is, to generate code for a user-specified optimization problem. The source code of GENO itself is not available to reviewers, or at least I could not find it. I would find access to GENO's source code useful in order to assess quality and significance of the submission. Significance: I am not fully convinced of the significance of this submission. From my point of view, GENO is lacking the following capabilities in order to really be chosen over specialized solvers: 1. It would need to properly understand and exploit convex duality, to automatically solve a problem either in primal or dual form, depending on which form is more amenable. This, in turn, depends on the problem dimensions, which are not specified by the user. 2. It cannot currently handle the complex training objective functions arising from structured prediction problems. These often include an "inference" sub-problem that needs to be solved repeatedly as part of the overall optimization; or, in some cases, the sub-problem can be "moved" into the overall objective through appropriate dualization. GENO does not seem to provide any help in this regard. 3. Deep learning problems seem to be considered out of scope. -- Update after author response -- I'd like to thank the authors for pointing out where I can find the source code. It was my mistake for not spotting the link to the GitHub repository at the bottom of the website (http://geno4neurips.pythonanywhere.com/) that generates solver code for a given example. Regarding significance, even after the author response, I still have some doubts. Time will tell if GENO sees wide adoption.