__ Summary and Contributions__: Update: I'm raising my score one point in response to the author rebuttal.
The paper proposes a kernel method to estimate the PMF of the hard predictions w.r.t. the model parameters.

__ Strengths__: Common group fairness metrics are computed using hard predictions. The key benefit of the proposed method is that (approximations to) fairness metrics relying on these hard computations can be used by automatic differentiation software.

__ Weaknesses__: The authors acknowledge that kernel methods may not scale to high dimensional problems. Furthermore the method as stated is limited to classification problems with binary targets.
The experimental section is somewhat unsatisfactory (will describe in further detail below).
Also the benefit of the kernel method relative to alternative approaches seems to be different in the motivation than it is in the experiments, which makes the paper feel somewhat disjointed (discussed below).

__ Correctness__: I believe so.

__ Clarity__: There are some typos. Also I think the introduction is quite vague and could be improved by either adding citations to general descriptions of recent trends, or focusing more on this hard/soft prediction issue as a key motivation for the proposed method.

__ Relation to Prior Work__: The discussion of the adversarially learned literature is somewhat incomplete. In addition to the work by Zhang the following papers should be discussed
* https://arxiv.org/abs/1511.05897
* https://arxiv.org/abs/1511.00830 (a VAE counterpart to the GAN appraoch)
* https://arxiv.org/abs/1802.06309

__ Reproducibility__: Yes

__ Additional Feedback__: I think the overall idea is somewhat intriguing, and agree that the distinction between hard prediction and model-assigned soft probabilities is typically glossed over in the literature. But sections 3 and 4 seems somewhat disconnected to me, since this differentiability issues does not come across strongly in the experiments (instead the instability of baselines due to adversarial training is emphasized). I also think that laying out the extending to multi-class classification would help the paper a lot (to save space experimental details could be moved to the appendix)
About the experiments, there are several improvements to be made. I have a concern about whether the baselines were substantially tuned, since there is no discussion of how hyperparameters such as learning rates were selected for the baselines. I also didn't understand the choice of a narrow MLP architecture (14 hidden units per layer) for the neural net methods. It seems that the main claim in section 3 is that the kernel method will help by making the objective differentiable, but in section 4 the claim is that the kernel method is more stable than competing methods. Is there an ablation (even in a controlled synthetic setting) that could show that the differentiability of the kernel method makes a meaningful difference in the performance?
I don't think it makes sense to specifically name the synthetic data as coming from "Black"/"white" racial groups since this data is entirely synthetic (use "majority"/"minority" instead).
typos:
S1 - "that serves to estimate probability distribution" -> "that serves to estimate a probability distribution"
S4 - "...a single-layer linear class." -> "...a single-layer linear classifier"
"spreadness" is an awkward word; use "variability"

__ Summary and Contributions__: The paper proposes differentiable fairness (demographic parity and equalised odds) regularisors for classifiers based on minimising the cross-entropy loss. The fairness regularisor is computed by estimating the conditional distribution of the classifier's probability scores given protected attributes (and outcome in the case of demographic parity) via kernel density estimation.
They claim that this approach improves on other approaches to fair classification because the regularisor is based on direct estimation of the conditional distributions on which the fairness metrics are based, rather than optimising a proxy such as the covariance.
They show empirically that their approach achieves a better fairness/accuracy trade-off than some alternate approaches on simulated data and that it is comparable to Agarwal et al on real datasets.
They claim their approach has better runtime performance than Agarwal et al.

__ Strengths__: The proposed approach is simple, straightforward to implement and the authors provide code. The experiments provided are suggestive that this approach may be competitive with more complex approaches.

__ Weaknesses__: The major weakness of this paper is the lack of theory to back the claim that the approach will outperform existing methods. If it is not possible to prove improved performance due to the non-convexity of the problem, this should be acknowledged and discussed.
To make a claim about improved performance without theoretical results, the empirical experiments would need to be comprehensive and to explore cases where the method might intuitively be expected to fail, as well as settings where we would expect it to succeed. Unfortunately, the empirical experiments fail to separate improvement due to the alternate regularisor from improvement due to the use of a more flexible model. It is hardly surprising that that a layer NN outperforms logistic regression on synthetic data that is not linearly separable.
They state that that their approach is faster than that of Agarwal et al, but do not attempt any time complexity analysis, theoretical or empirical(ie by plotting time vs number of data points), that would demonstrate that this improvement is due to the algorithm as opposed to implementation details.

__ Correctness__: The claims of the paper (regarding improved performance) need to be more formally stated in order for their correctness to be properly assessed.
The empirical methodology appears reasonable, however the breadth of experiments run is not sufficient to provide strong evidence for the claims made.

__ Clarity__: Overall the paper is readable, but the problem statement and contribution claims need to be stated more clearly (formally).
The problem statement should clearly state the class of problems you are attempting to solve - eg fair classification with respect to DP or EO with discrete (binary?) protected attributes.
The discussion of the regularised classifier (starting line 110) belongs in the proposed approach section.
Statements such as the KDE trick "allows us to faithfully quantify fairness metrics" should be more formally stated or omitted.

__ Relation to Prior Work__: No, the paper focuses discussion and comparison only three competing in-processing approaches: Agarwal et al, Zhang et al and Zafar et al. They fail to discuss other approaches to directly estimating and regularising independence based measures in particular approaches based on information theory, eg see: Wasserstein Fair Classification (Deepmind), Fairness-Aware Classifier with Prejudice Remover Regularizer, Fairness Measures for Regression via Probabilistic Classification

__ Reproducibility__: Yes

__ Additional Feedback__: The authors have addressed my points of concern in their response. With the experiments the authors propose to complete, and perhaps some additional experiments testing how performance degrades as the number of protected attributes (and thus dimension of the density estimate) increases, this could be a strong paper. The approach is straightforward, applicable to many model classes and could be a practical way of encoding fairness concerns in real AI systems.
Unfortunately, given the extent of the proposed changes and the fact that the experiments currently in the paper fall short of demonstrating the key claim of improved performance I don't support accepting this submission this time around.

__ Summary and Contributions__: Update: After reading all reviews and rebuttals, I agree that the exps in current version is incomplete. My major concern is that the datasets are too simply. During the rebuttal, the author said that they conducted exps on complex datasets and observed the same trends. I believe the author's comments. However at the current stage, there are multiple exps missing. The current submission is not ready. I would lower the score to 5.
This paper learns a fair classifier. Specifically, the input to the classifier is data with both ordinary and sensitive (e.g. gender, races) information. The model needs to learn a classifier that is irrelevant to the sensitive information. The criterion of determining whether the classifier is relevant to the sensitive information is Demographic Parity (DDP) and Equalized Odds (DEO). Directly optimizing both criterions is not easy. Previous works use proxy regularizer to enforce those two criterion. This paper proposed to directly estimate the DDP and DEO using kernel density estimation. The proposed approach achieves good performance when the prediction is binary.

__ Strengths__: Strength:
1. The paper is well-written and easy to follow.
2. The contribution is clear that the paper uses KDE to directly estimate the DDP and DEO. KDE can also be differentiable. Therefore the gradient descent can be used to train the model.
3. Comparing with other approach, when the decision space is binary, the proposed approach shows stability in training and better performance.

__ Weaknesses__: Weakness:
1. As mentioned in Remark 1, the KDE would requires more data when the classifier space is beyond binary. However for other approach, Zhang et. al.[44], this seems not a problem. I wonder whether it is possible to show the comparison of the performance on datasets that the decision space is beyond binary?
2. Is it possible to show the relation between the size of decision space and the amount of data required for KDE?
3. Experiments:
a. My first concerns in the experiment section is whether the dataset is challenging enough? The setting of all real-world datasets seems very simplified. Specifically, the decision space is binary and only ONE attribute is considered as sensitive information. Therefore, I wonder whether the proposed approach would work in a more complex setting where more attributes are considered as sensitive information.
b. I have another concerns about the potential impact of the proposed approach. Specifically, I wonder would this approach be generalized to other tasks? Comparing with the experiments in [44], the approach in [44] seems could be applied into a more broad use case, which is the de-biasing on word embedding. However, it seems non-trivial for the proposed approach to generalize beyond the binary decision problem.
c. Comparing with [44], apart from stabilized training, what is the advantage of the proposed approach over [44]. Performance wise, it seems the proposed approach achieves similar results as [44] in Credit Card dataset and COMPAS dataset. Computational time wise, the proposed approach is actually slower than [44] on Law School dataset and adult census datasets. Application wise, it seems [44] could be applied to a broader application domain than the proposed approach.

__ Correctness__: The method is sound and clear.

__ Clarity__: The paper is well written. The whole storyline and approach is easy to follow. The task is clearly presented.

__ Relation to Prior Work__: The relation to previous work is clearly discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: Although I am not very familiar with this area, I feel the contribution of the proposed approach is important. Specifically, enabling end-to-end training from DEO and DDP is an important contribution. However I have concerns with the following two points. 1. It seems the applications that the proposed approach could be applied are pretty limited (weakness 3.b). 2. Comparing with [44], what is the advantage of the proposed approach? (weakness 3.c)

__ Summary and Contributions__: The paper uses kernel density estimation to directly quantify measures of fairness without using a proxy. It achieves good accuracy in measuring the probability distributions used to compute the fairness measure, and the fairness measure can be directly optimized via gradient descent. Empirically, this method achieves good accuracy-fairness tradeoff and improves training stability when compared to prior methods.

__ Strengths__: Based on the comparison to prior work described in the paper, I believe that the contribution is novel, in particular the direct computation of an interested fairness measure and the ability to directly optimize it, without relying on fairness proxies such as a covariance function.
Empirical evaluations are performed along various axes (e.g. accuracy-fairness tradeoff, training stability, robustness to hyper-parameters) to demonstrate the utility of the proposed approach.
I also believe that this work is highly relevant to the NeurIPS community.

__ Weaknesses__: In high dimensions, the KDE approach gives an inaccurate distribution estimate without an exponential number of samples with respect to the number of dimensions. The authors note that this is not the case when using a binary classifier, but this means that the KDE approach is limited (with respect to its ability to be used in practice) to the binary classification setting.
An empirical evaluation of the accuracy of the pmf distribution estimate Pr(\tilde{Y}= 1) is not present.

__ Correctness__: To the best of my knowledge, the claims and method are correct.
In Figure 1, I am curious about which dataset was used in computing the pdf estimates of \hat{Y} and pmf estimates of \tilde{Y}. Also, is there a way to quantify the robustness or compare robustness between the pdf and pmf estimates in a quantitative way? Finally, while the right table in Figure 1 demonstrates the robustness of the estimated pmf of \tilde{Y} against various h's, it is not clear to me how this correlates with accuracy.

__ Clarity__: Yes, as someone without a background in fairness, I found the paper both instructive in discussing fairness measures and prior approaches, as well as clear in describing the method and its benefits. The application of a kernel density estimator to construct the loss function to be added as a regularizer was well motivated and explained. Overall, the paper strikes a good balance between conciseness and explanation. The appendix is also well structured and helpful.

__ Relation to Prior Work__: Yes. It is discussed how this work differs from contributions which use a fairness proxy and works which use an adversarial learning framework to ensure that predictions are made independently of sensitive attributes. The key differences from those works are that the fairness measures are directly quantified, so fairness measures are well-respected, and no adversarial optimization is required, therefore improving training stability.

__ Reproducibility__: Yes

__ Additional Feedback__: A note: Fairness in machine learning is not my area of expertise, and therefore, I am not at all familiar with the related work. However, I was able to understand the paper and the approach that was described.
A naive question: If sensitive attributes are known apriori, why not simply remove them from the dataset rather than enforcing a fairness constraint based on independence of the prediction and the sensitive attribute? (This is briefly mentioned in lines 88-90.) Wouldn't this automatically ensure fairness under both the Demographic Parity and Equalized Odds definitions?
Update (post rebuttal):
I am lowering my score to a 5 based on the need for a more comprehensive experimental evaluation, such as having experimental results with multi-class classifiers and results on more complex datasets.