__ Summary and Contributions__: This paper discusses online batch classification problem with individual fairness constraints. It follows a metric free approach where an auditor detects fairness violation. Authors use a Lagrangian formulation to unify the constraint and the classification loss. They reduce the problem to "vanilla" online batch classification problem and proves a sublinear regret bound using CONTEXT-FTPL algorithm. Following this, they generalise the analysis to stochastic setting.

__ Strengths__: 1. The paper solves an interesting problem which can have practical impact.
2. The paper proposes a clean reduction technique to formulate the problem and deploy online algorithms in it.
3. It removes the constraining assumptions in the existing literature such as Mahalanobis metric, while achieving the \sqrt{T} regret in loss.
4. The algorithm also achieves T^{-1/4}-fairness violations with high probability that resonates the PACF uniform convergence sample complexity.

__ Weaknesses__: 1. The bounds have a dependency of log(|H|). Essentially, the size of the hypothesis space can be unbound or quite large. This can be a big issue to use this work in reality.
2. The reduction of the problem is not described enough in the main paper.
3. The assumptions of separators and CONTEXT-FTPL are described as passing references. But the final results depend significantly on them.
4. Being specific to previous results like which theorem or which bound is needed.

__ Correctness__: I have not found out any error in the proves.

__ Clarity__: The paper is clearly written but the description of the reduction from online fair batch classification to online and online batch classification, and reason behind using context FTPL should be elaborated a bit longer in the main paper.

__ Relation to Prior Work__: 1. The paper has healthy amount of references.
2. Being specific to previous results like which theorem or which bound would be really helpful. For example, "qualitatively matches...[23]" (Remark 4.5). Here, it is easy to understand if a specific result is referred.
3. [9] depends on the dimension d whereas the results of this paper depends on log(|H|). Explaining this trade-off at least intuitively would be helpful.

__ Reproducibility__: Yes

__ Additional Feedback__: Except the points on clarity, prior works and weakness, please look into following points:
1. What is epsilon in Theorem 3.8 and Cor 4.4? What are the choices of it? This parameter comes from the Laplace distribution right? This should be explained.
2. An intuition behind practicality of transductive setting and separator set would be good to discuss. It is not okay to assume reader's familiarity with CONTEXT_FPL paper.
3. The generalisation in section 4 is often discussed as a stochastic setting in online learning while the previous sections are on adversarial setting. It would be good to mention that and also why the bounds for this setting improve.

__ Summary and Contributions__: The paper introduces a new online learning model for studying individual fairness. It provides a learning algorithm that achieves sublinear regret both for accuracy and fairness. In the case when examples are sampled from a probability distribution, the paper also gives strong generalization bounds. These results are obtained using significantly weaker assumptions than previous related papers.

__ Strengths__: This paper addresses an important problem, and obtains significantly stronger results than prior papers. This paper's online fair learning model has much weaker assumptions and feedback format than previous similar models. The proof techniques are nontrivial. Overall, this paper provides novel and significant contributions to the field of ML fairness.

__ Weaknesses__: The structure of the paper and the presentation of the results could be significantly improved (see details below).

__ Correctness__: I did not find errors in the paper.

__ Clarity__: The introduction and the problem problem statement are well written. However, the results (sections 3-4) are harder to follow and they could be presented more clearly. As an example, one of the main results of the paper, the algorithm for online fair batch classification, isn't described clearly enough in the main part of the paper; the pseudocode for it is only given in the appendix.

__ Relation to Prior Work__: The authors do clearly discuss the paper's relationship to previous work, but much of this discussion happens in Appendix A. In my opinion, discussion of previous work is a crucial part of a paper; it shouldn't be relegated to the appendix.

__ Reproducibility__: Yes

__ Additional Feedback__: Response to author feedback: I thank the authors for committing to improving the presentation of the paper.

__ Summary and Contributions__: The paper studies online learning with individual fairness as constraints. In particular, the paper assumes the existence of an auditor that detects fairness violations rather than assuming the known similarity metric among individuals. Different from a closely related work [9] "Online learning with unknown fairness metric" by Gillen et al. NeurIPS 2018, in each round the auditor only needs to return one pair of individuals identified as fairness violation. Under this setting, the paper establishes PAC-style fairness and accuracy generalization guarantees. The main contribution is to answer the question raised in [9] and the results show that online learning under an unknown individual fairness constraint is possible even without assuming a parametric of form of the underling similarity measure.

__ Strengths__: The presented general reduction framework, which takes any online learning algorithm as a black-box and obtains a learning algorithm that minimizes the cumulative classification error and the number of fairness violations, is sound. The removal of the previous assumptions, linear rewards and Mahalanobis distance, is also significant.
It is also a nice result to see the use of the Follow-the-Perturbed-Leader approach can achieve sublinear regret with respect to both misclassification and fairness violations in the online fair batch learning setting.

__ Weaknesses__: I do not identify any clear weakness of this work.
Somehow, I would like to see how results would be different when the auditor still returns the set of all pairs of individuals with fairness violations.
Similar to the work [9], the paper focuses on the fairness constraint that binds between individuals at each round. While the enforcement of the fairness constraint across rounds is difficult. But practically users may want to see to what extend the between-round individual fairness can be achieved in the proposed approaches.

__ Correctness__: I do not check the proof details in the supplemental file. But I tend to believe the claims and method are correct.

__ Clarity__: The paper is well written. It contains a lot of theoretical results. But the related work section should be moved (may be shortened) from the supplemental file to the main body.
The rough idea of the proof technique with the use of a composition covering argument could also be discussed in the main body.

__ Relation to Prior Work__: The related work section is included in the supplemental file. Some discussions there, especially comparison with [9] and [23], can be moved into the introduction section of the main body. But overall, relation to prior work is clearly discussed.
Regarding the rebuttal, I appreciate the authors' commitment of improving the paper's presentation and clarifications of my two questions, the case of all pairs of individuals with fairness violations returned, and between-round individual fairness.

__ Reproducibility__: Yes

__ Additional Feedback__: It is good to consider the setting with the adaptive (and weak) fairness feedback. Some discussions, like how the approaches and results would be different under assumption of gradually change, would be helpful. I change my score from 7 to 8.