Dear authors, This paper has been classified as a **CONDITIONAL ACCEPT**. In this meta-review I summarize my overall assessment of the paper, which is based on reading all of the reviews, the author response, the reviewer discussion, and also the full main paper. I did not review the supplementary material. I also summarize the assessment of 2 ethical reviews that were solicited by the program chairs as a result of the work (in particular, the BPD example) being flagged as ethically concerning. Overall the reviewers are in agreement that this is a clearly written paper that constitutes an interesting and important technical contribution to our understanding of learning in the presence of noisy protected groups. While reviewers have flagged that some of the bounds could perhaps be improved through further analysis, this is not viewed as a significant limitation of the paper. The main issue that reviewers have with the paper, which was heavily discussed following the author response (including by 2 external ethical reviewers, see below), was the Boston policing data example (hereafter abbreviated BPD). At this point the amount of text devoted to discussing the example far exceeds the amount of text that the example occupies in the paper. I state this by way of emphasizing the degree of concern expressed by the reviewers. Indeed, one of the reviewers lowered their overall score in view of these concerns. Since the reviewers are generally agreed that the paper would be above-threshold without the BPD example, I am not recommending rejection. I make the following concrete suggestions instead: - Remove the BPD example. The specific problem formulation is highly problematic as an illustration of fair ML for all the reasons already flagged in review. - If replacing the BPD example with the SQF example (which reviewers agree is less objectionable), you should use part of the extra page allotted for camera-ready revisions to clearly articulate the ethical implications of the proposed type of modeling and whether the algorithmic fairness notions you consider are reasonable and reflect societally desirable outcomes. As one of the reviewers notes, examples that are used in a given paper are often re-used for comparison purposes in later work. It is thus important that fair ML researchers proceed with the utmost care when framing new "benchmarks" that will be used in further study. - As part of the conditional accept process, you will have the opportunity to propose substitute data examples to replace the BPD data, whether these be the SQF example or a different data set that you and those involved in overseeing the process view to be better suited to illustrating the methods of the paper. **Ethical review summary** This paper received 2 ethical reviews. The ethical reviewers (ERs) were asked to respond to a series of questions. I provide summaries of the questions and responses below. They were also asked to make an accept/reject recommendation. The ERs were divided. One ER voted ACCEPT. The other voted REJECT. 1. What (if any) revisions are needed to the broader impacts statement. The broader impacts section should not be a less-technical restatement of the abstract. "The general concern [the authors] should be aware of is that a technical improvement in the intrinsic fairness of an algorithm will lead folks to ignore the importance of the context in which the algorithm is applied," as illustrated by the original BPD example. Instead, the authors should be clear how applying their work in settings where there is, e.g., "significant background injustice" may not meaningfully address inequities in a meaningful way. One ER said: "[The authors] could perhaps have a sentence at the beginning like this: "As machine learning is increasingly employed in high stakes environments, amid considerable background social injustice, any potential application has to be seriously scrutinized to ensure that it will not exacerbate those background conditions. Aiming to make machine learning algorithms themselves intrinsically fairer, more inclusive, more equitable plays an important role in achieving that goal." and then bridge into what they're trying to do." 2. Would the publication of the research potentially bring with it undue risk of harm? Quoting an ER directly: "I agree with the authors that there is a need for new approaches to how to work with noisy, unreliable or outdated group information. While this motivation is important, the methods for doing so in this work undermine the claim to be working toward fairness as a goal. Other reviewers have noted the skewed and problematic BPD data and I see the [proposed SQF] experiments as an improvement. But the bigger problem is that the BPD experiments assume that police can 'describe' someone's race by looking at their face. This is a type of racial essentialism that has been profoundly critiqued in both the social and humanistic sciences, and by simply applying the data as a kind of ground truth with no further discussion, it perpetuates the idea that this constitutes an acceptable approach. I agree with other reviewers that this makes it difficult to accept the paper in its current form." ******************************* Note from Program Chairs: The camera-ready version of this paper has been reviewed with regard to the conditions listed above, and this paper is now fully accepted for publication.