__ Summary and Contributions__: This paper extends the fairlets machinery to hierarchical clustering under different objectives (revenue, value and cost).Theoretical results have been derived and polynomial time algorithms have been proposed for the objectives of revenue and value. Empirical evaluation has been conducted to verify the effectiveness and efficiency of the proposed approach.

__ Strengths__: Fair clustering is an important topic in the field of machine learning.
The paper has analyzed quality guarantees and proposed polynomial time algorithms for optimizing revenue and value with fairness.

__ Weaknesses__: Some important definitions, such as beta approximation, should have been added in the paper to make it self-contained.
As mentioned by the authors, algorithm proposed by ref [14] is also a fair hierarchical algorithm.Its results may be compared to assess the effectiveness and efficiency of the proposed approach.
------- After Authors feedback-------
I would like to thank authors for addressing the comments. I have read the author response and reviewersâ€™ discussions.The technical novelty of the paper has been made clearer during the discussions between the reviewers. So I increase my score to 7.

__ Correctness__: Correct.

__ Clarity__: Well written.

__ Relation to Prior Work__: Clear.

__ Reproducibility__: Yes

__ Additional Feedback__: Some important definitions, such as beta approximation, should have been included in the paper to make it self-contained.
Algorithm proposed by ref [14] may be compared to assess the effectiveness and efficiency of the proposed approach.

__ Summary and Contributions__: This paper applies established fairness notions such as disparate impact and representational fairness to hierarchical clustering objectives. The authors show how to extend the fairlet technique (essentially finding micro-clusters that (i) are a lower bound on the general cost, (ii) satisfy the desired fairness notion (iii) are easier to compute) to various objectives proposed for graph clustering.
The authors show that this allows them to preserve state of the art approximation guarantees for the various objectives that they consider.

__ Strengths__: Straight off the bat, this is one of the more convincing fairness papers I have seen submitted to this year's NeurIPS. I like that this paper sticks to established notions of fairness and how the authors show how fairlets can be used here. If there exists an algorithmic technique in the field of fair algorithms, then fairlets have a strong claim to the title, as they have received by far the most numerous attention (in my opinion).
The results themselves are solid. I would like to highlight the final result that the authors gave, being the extension of the DasGupta objective. I think this result is highly interesting, and while I understand that the authors buried it somewhat for the sake of space, I think that this is arguably the most intersting contribution.
Technically, this paper is also non-trivial. Indeed, if this paper bridges a gap between people usually not interested in algorithms and the more theoretically minded part of the NeurIPS community, then this is a welcome side effect.

__ Weaknesses__: I wish the experiments had been put into the supplementary material and that the paper had given more room for the actually interesting results.
Then again, I guess this is expected of a NeurIPS paper, so who am I to argue.

__ Correctness__: As far as I can tell, yes. I only thoroughly checked theorem 17, which I believe is the most interesting result. The other results are less surprising, and therefore plausible.

__ Clarity__: Yes.

__ Relation to Prior Work__: Mostly, related work is solid. It should be noted that Kleindessner et al. do not actually give a linear time approximation for the same problem (in fact, they only give a result for what is essentially k-center under a matroid constraint).
The authors also could have cited Schmidt et al. Fair Coresets and Streaming Algorithms for Fair k-means (WAOA 2019).

__ Reproducibility__: Yes

__ Additional Feedback__: I think this is the strongest paper in my pile, and definitely one of the stronger fairness papers. I would like to see it accepted.

__ Summary and Contributions__: The paper studies the hierarchical clustering in which the goal is to recursively partition the input to minimize certain objective functions with group fairness requirement. In group fairness requirement, each cluster has at most alpha fraction of its point from a same group.

__ Strengths__: The paper provides simple algorithms with provable guarantees for fair hierarchical clustering with respect to three standard objective functions studied for this problem. In general, it has a solid contribution and expands the understanding of fair clustering.

__ Weaknesses__: The techniques seem standard (in the area of fair clustering) and the paper has limited novelty in terms of of technical contributions.
While hierarchical clustering is a natural problem, and fair clustering is a well-motivated problem, it will be helpful if authors provide more motivations/justifications why this notion of fairness is useful for hierarchical clustering.

__ Correctness__: The general proof sketch and results seem correct (I have not verified all details in the appendix) and experimental results are also reasonable and helpful.

__ Clarity__: The paper is well written and has a good flow.

__ Relation to Prior Work__: The paper has discussed related works comprehensively and has done a fair comparison. I have some suggestion in the additional feedback part.

__ Reproducibility__: Yes

__ Additional Feedback__: Line 68: Kleindessner et al. designed an algorithm for k-center with different type of fairness requirement. Instead of balancing different colors in each cluster, the goal is to pick centers (proportionally) from different colors. It is basically k-center under partition matroid.
Line 69-70: In a(n almost) concurrent work, the fair correlation was also studied by Ahamdi et al.
Line 131: Bounded representation: with binary colors, it is the same as balance. Bera et al. and Bercea et al. have also considered a generalization of this notion prior to Ahmadian et al. How does bounded representation compare to them?
On the cost of essentially fair clusterings by Bercea et al. (2018)
Fair Algorithms for Clustering by Bera et al. (2019)
Line 154 and Line 188 (Theorem 8): what is an upper bound on the value of m_f in terms of alpha (for general values of alpha)?
Algorithm 1 â€“ step 2: how to find an initial solution?
Line 270: For for
Line 291: obseve -> observe
Line 300: usint -> using
------- After Authors feedback
I decided to increase my score to "accept" but as Reviewer 2 I also suggest authors to provide a better presentation of their theoretical contributions (maybe in the introduction).