__ Summary and Contributions__: This paper presents a method for learning linear programs from
data observed from optimal decisions with a strong application
to inverse optimization and recovering the parameters of LPs.
They provide a PyTorch implementation of an interior-point method
and provide two differentiation options.
---
Thanks for the rebuttal and clarifications. After the reviewing period I maintain the same view on this paper.

__ Strengths__: Understanding how to recover LP parameters from data and inverse optimization
is a timely research topic and has many potential applications
in operations research and controls.
Characterizing the differentiation models (backprop through the solver,
implicit, and direct) is important. The ablations here are interesting
and could be improved by analyzing other tradeoffs between these methods,
such as compute/memory tradeoffs.
The loss results in Figure 3 and Figure 5 alone make it difficult to
see what differentiation method should be used in practice.
The ability to handle and correct unbounded or infeasible subproblems
is important and this method demonstrates the ability to handle this.

__ Weaknesses__: Beyond the random search and COBYLA baselines, I don't have a good
sense of how this method compares to related approaches in the
settings they consider, for example the simpler settings of [Tan 2019].

__ Correctness__: I see no correctness errors in this paper

__ Clarity__: The paper is clear and well-written

__ Relation to Prior Work__: The paper adequately describes the related work in this space

__ Reproducibility__: Yes

__ Additional Feedback__: One interesting use case of the unrolled derivatives is when the solver
does not fully solve the problem. It would be insightful to do an
experimental analysis similar to the min-differentiation ablations in
[Ablin 2020, https://arxiv.org/abs/2002.03722] to see how close the
derivative approximations are as the iterates converge.

__ Summary and Contributions__: The authors consider inverse optimization, a very important and little studied area, and propose a common-sensical approach, based on sequential quadratic programming.

__ Strengths__: Inverse optimization, a very important and little studied area. It's natural to recast inverse linear programming as a non-linear optimisation problem (NLP), as the authors do, and it is likewise natural to solve the NLP by a sequence of convexifications (SQP).

__ Weaknesses__: Unfortunately, the authors stop at the rather obvious suggestion that one can apply SQP to the NLP in question.
The authors do not even prove the SQP converges to second-order stationary points:
http://www.optimization-online.org/DB_HTML/2013/10/4102.html
More generally, there is much stronger theory for SQP:
https://link.springer.com/article/10.1007/s10589-014-9696-2
than the authors present for their very special case of NLP in the current paper.
The implementation, while using PyTorch and many rather complicated tools, does not seem to scale beyond very small instances. It also does not acknowledge the heavy reliance on:
https://github.com/cvxgrp/cvxpylayers
https://web.stanford.edu/~boyd/papers/pdf/diff_cvxpy.pdf
which is cited only as a pre-print, rather than the actual NeurIPS 2019 paper.
Perhaps most importantly, the implementation is compared to a rather basic algorithm for generic NLP (COBYLA of Powell), rather than the state of the art in general-purpose DFO or specialised methods for inverse optimisation. It's hence largely beating the straw man.

__ Correctness__: As far as I can tell.

__ Clarity__: Figures 3 and 5 are rather confusing. What does "training success @ 10^-5" mean? Should the percentage be divided by 10,000?

__ Relation to Prior Work__: No -- see weaknesses.

__ Reproducibility__: Yes

__ Additional Feedback__: Having read the rebuttal and the other reviews, I have amended my review and score.

__ Summary and Contributions__: The authors proposed to generalize the inverse optimization problem to a framework called "inverse linear optimization problem", where the coefficients of the LPs are parameterized by arbitrary functions. Then, they proposed to use the implicit-gradient-based method to differentiate through the LP to learn about the parameters.

__ Strengths__: The authors discussed the cases when inverse optimization methods failed and compared their gradient method to the traditional inverse optimization methods. They showed that the proposed method successfully recovers the parameters of a multi-community flow problem. It is good to see comparison from inverse optimization literature in the ML settings. Also, although not explicitly mentioned in the draft, I think the authors spent considerable effort on tuning the LP solver / code to make it work under the settings.

__ Weaknesses__: A fatal weakness of the draft is that the proposed framework is already covered by the OptNet / CVXPY layer, but there is no comparison to them. Specifically, theorem 1 is a special case of appendix B in the CVXPY paper. If the focus of the paper is on the solver, then comparison to CVXPY is necessary. From the current experiment, I cannot see the advantage over simply using the existing CVXPY layer. On the other hand, if the focus of the paper is on comparing works in inverse optimization literature, then comparisons on more recent work in IO might be necessary.

__ Correctness__: I think the overall claims are correct.

__ Clarity__: The paper is clearly written, and the figures are illustrative.

__ Relation to Prior Work__: Prior works are properly discussed but not properly compared in experiments.

__ Reproducibility__: Yes

__ Additional Feedback__: The rebuttal addressed the concerns in my review, so I bumped the score from 5 to 6.

__ Summary and Contributions__: The paper contributes a gradient-based approach to 'inverse optimization' problems. Here, we observe (u, x) pairs, where x is assumed to be the optimum of an optimization problem with parameters given by a function f_w(u). The goal is to learn u. Inverse optimization is very important in a variety of applications. This paper contributes a method that allows arbitrary f_w(), such as neural networks.

__ Strengths__: It is important to build stronger bridges between the OR and ML communities. This paper is approachable for an ML audience (it uses black-box neural networks for certain modules, it backprops through LP solvers) but also draws on sophisticated non-linear programming techniques that are not familiar to most Neurips readers. The paper is well written, is technically rigorous, and does a good job of analyzing the behavior of the methods.

__ Weaknesses__: The paper relies on synthetic problems for evaluation. As the authors point out, this is typical of the inverse optimization literature, as it is important to have a controllable distribution over optimization problems. The experiments are technically sound. However, a real-world application, with a case study on how to formulate the various components in the objective, would be helpful.

__ Correctness__: I am not an expert on non-linear programming, so I can't evaluate some of the details of the derivations of the method. I found the empirical methodology satisfactory.

__ Clarity__: Yes, I found it surprisingly approachable.

__ Relation to Prior Work__: Yes. The paper's method is applicable in more settings than prior inverse optimization work. For example, they can also learn the parameterization of the cost matrix (not just the objective weights) of an LP. Their method is more general because they do not assume a given feasible region for each optimization example.

__ Reproducibility__: Yes

__ Additional Feedback__: **update after author's response**
I found that you addressed the other reviewers' concerns adequately.