__ Summary and Contributions__: The paper investigated the constrained learning based on the probably approximately correct (PAC) learning framework. Authors proposed the framework of constrained learning and generalizsed theory and, demonstrated how to approximate such learning in practice.

__ Strengths__: Constraint learning was nicely framed and it was easy to follow the problem and ideas. Based on the empirical Lagrangian, the constrained learning solution was proposed and authors showed that this is compatible.

__ Weaknesses__: The compatibility of the new constraint learning algorithm was shown through simulation studies.
For the first simulation study, the constraint is on the KL for male and female is under some threshold, c. Why would you want to fit a parametric model with small difference (or zero difference) between gender? For data analysis purpose, wouldn’t you want to have a model in which characterize gender difference?
This framework is useful when there are clear constraints due to the nature of problem. When there is no clear constraints, can we use this algorithm to improve a better fit or investigate particular feature of data?

__ Correctness__: Although this is not my area, I didn’t find any major fault.

__ Clarity__: It was easy to follow the paper and the structure was clear.

__ Relation to Prior Work__: Literature review was provided.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The paper elaborates a theory for PAC learning under constraints. Authors show that PAC and PACC learning are equivalently hard, and derive a generalization bound for PACC. They also develop a method for learning and show experiments that support their theory.

__ Strengths__: 1. Claims and experimental set up are technically correct, though the latter is detailed in the appendix.
2. The line of work is novel since there is scarce work in PAC learnability under constraints. The significance of the theory might be somewhat limited as it shares the same limitations of unconstrained PAC, where there is more advances like PAC-Bayes and its many variants.
3. The work is of high relevance to the learning community as it serves as first steps toward formalizing learnability under constraints.

__ Weaknesses__: In my opinion the main weakness/limitation is the dependance of the generalization bound on the optimal lagrange multipliers (dual variables). That is, it unclear to me how could we compute the number of samples without knowing them. The appeal of many generalization bounds is that one could compute the sufficient number of samples to attain certain desired error. I would appreciate if authors can comment on this concern.

__ Correctness__: Yes, all claims and experimental set up are sound.

__ Clarity__: Yes, the paper delivers its message clearly.

__ Relation to Prior Work__: Authors do a good job relating and contrasting to prior work.

__ Reproducibility__: Yes

__ Additional Feedback__: Some minor comments:
* What does the acronym CSL stand for?
* The norms in result of Theorem 2 (equation 6), should both be || ||_1 or does L_1 denotes another norm?
* There is a bracket missing in (3a)
=== UPDATE:
After reading author's response, I am leaving my score unchanged. I consider the work to be a good contribution to the theory of constrained learning.

__ Summary and Contributions__: The paper extends the probably approximately correct (PAC) learning framework to include constraints on the learned hypothesis. This new framework, coined PACC is shown to be as hard as a corresponding PAC problem (without constraints). A practical algorithm for PACC learning is discussed.

__ Strengths__: Putting forward an extension of the PAC set up by taking constraints explicitly into account seems an interesting idea. The paper also illustrates how to use this extended framework for implementing fair and robust ML methods.

__ Weaknesses__: - I suggest adding a detailed outline of the paper in the introduction section
- the contributions relative to existing work should be spelled out more explicitly in Section 2
- the quality of presentation and clarity should be improved
- it is not made clear how the theory developed in this paper allows interpreting
the numerical results,e .g. "Doing so yields nominal accuracies of 87% with adversarial accuracies of 82% at 0.04, 75% at 0.08...."

__ Correctness__: Probably.

__ Clarity__: the wording/clarity might be improved, e.g.,
- "...we can soften the worst-case requirements..." maybe use weakened instead of soften
- i guess in (3a) and (3b) are some typos. e.g., what is the index i and j there ?
- "PACC Learning is as Hard..." would it make sense to use "is not harder than PAC" instead of "..is as hard ..."
- "...that PAC and PACC learning are equivalent problems." in which precise sense are they equivalent?
- "Although we know this dual problem is not related to (PIV)..." how is it related precisely ?
- how is epsilon related to epsilon_0 in Theorem 2?
- what is the input and what is the output of Algorithm 1
- maybe you could state Algorithm 1 earlier in the paper to improve the readability
- "... if perturbations as small as 0.04 are allowed"
- "...the dual variables have a sensitivity interpretation: the larger their value, the harder the constraint is to satisfy [18]" can we make this interpretation precise by some bound ?

__ Relation to Prior Work__: the offerings made by this paper relative to existing work could be made more explicit in Section 2.

__ Reproducibility__: No

__ Additional Feedback__: i could not find the details for how step 3 of Algorithm 1 is implemented in the experiments.