__ Summary and Contributions__: The paper introduces a generalized formulation of unbalanced OT distances, called the generalized Kantorovich Rubinstein (GKR) distance. Several previous work become the variants of the GKR distance. The authors first prove the negative result that the KR distance and the partial OT distance can not be solved under O(n^2). Then, the authors propose to use the GKR distance on quad trees that are derived from the Euclidean space to approximate the GKR distance on the original Euclidean space as the ground metric. The approximation is proved to be upper- and lower-bounded. Moreover, the authors propose a fast algorithm to compute GKR, based on accelerated dynamic programming and optimized tree structures. The algorithm is proved to solve for the GKR distance on tree metrics in O(n(logn)^2). Some experiments wrap up the paper testing its complexity, approximation accuracy, and scalability.

__ Strengths__: -- soundness of the claims:
I haven't found any error in methodology and proofs.
-- significance and novelty of the contribution:
The paper introduces a fast OT algorithm that breaks the O(n^2) barrier. This is a significant algorithmic breakthrough.
-- relevance to the NeurIPS community
The paper addresses an important open problem of unbalanced OT that has numerous applications in the machine learning community.

__ Weaknesses__: I included in here all my other comments and questions, some of which are not necessarily weaknesses.
L 131: "the GKR distance does not necessarily transport all the mass but pays penalties for mass creation and destruction" The two hyper-parameters might be troublesome for practical use, especially when the authors do not offer any insights or cross-validation on that
L 135: "... the GKR distance includes many popular variants of the OT distance as special cases" I suggest the authors include the mathematical expressions of the connections between GKR and other existing (unbalanced) OT distances, at least in the supplementary.
-- Page 1, the noise toy experiment
Line 20: The noise example might not best illustrate the advantage of unbalanced OT because most noises can be assumed to have a zero mean (after some preprocessing) and thus may have a very little impact on the normalization. Comparing measures that originally have different totals may better serve the purpose, such as two datasets with different number of Dirac samples or the total measure changes as the data evolve in a dynamic process.
L 585: The baseline is missing an existing unbalanced OT solver.
-- Page 7, Chicago Crime
Specifying the ground metric as R^2 helps readers understand the experiment if they have no background of the dataset.
What is the rationale behind a distance between measures in different days? Intuitively, what are we comparing?
How do we build a tree out of Dirac masses in R^2?
-- Page 8, NY Taxi
Line 304: "The ground space is 2-dimensional space..." Isn't using the Manhattan distance more reasonable?
The baseline is missing an existing unbalanced OT method on the ground metric.
-- Trivial things
Title: "on Tree(+s)" should be better
Line 33: notati(+o)ns
Line 63: "Kant(-o)rovich", since the rest of the paper uses "Kantrovich". It is a rare spelling of Kantorovich though.
Line 270: Figure 3 should include the entire trajectories to show their (non-)linearity in the log scale and their relative positions at a small sample size.
Line 287: "0.70 second(-s)"

__ Correctness__: I haven't found major errors or weaknesses.

__ Clarity__: The paper is generally well written.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: ------------------------
After rebuttal:
I have checked the rebuttal and other review. I am disappointed by the response to my questions and concerns because it dodges my points but the response to other comments is satisfactory in my opinion. Overall, I think the paper is marginally above the bar after incorporating the rebuttal into the new version. I will keep my original rating.

__ Summary and Contributions__: This paper deals with the unbalanced optimal transport problem. The authors define the gneralized Kantorovich Rubinstein (GKR) distance that allows mass creation and destruction. They propose an algorithm that solves the GKR problem in quasi-linear time on a tree metric. Well detailed numerical experiments are conducted on synthetic and real data.

__ Strengths__: The novelty in this paper comes with a promosing algorithm for solving unbalanced OT problem. There is also a theoretical study of time complexities for the unbalanced OT from an algorithmic perspective. Structuring the unbalanced OT using tree metric is very interesting and seem to work very well in practice.

__ Weaknesses__: Some comments are as follows:
- To construct the cost tree metric, how we choose the weight function?
- The authors proved only the triangle inequality for the GKR distance. I am wondering if GKR(mu, nu) = 0 iff mu = nu is automatically verified?

__ Correctness__: All the experiments are well detailed and explained.

__ Clarity__: Over all the paper is well written.
Minor comments:
- Ligne 145, the k-SAT problem is not defined before Hypothesis 1.
- Ligne 161, by Indyk and Thaper

__ Relation to Prior Work__: It would be interesting to compare the performance of the proposed algorithm to the generalized Sinkhorn algorithm to compute unbalanced OT studied in Chizat et al (2018).

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper proposes the GKR distance, which encompasses many previous unbalanced OT problems. The authors show that important special cases of the GKR distance on Lp metrics cannot be computed in strongly subquadratic time under SETH. They then proposed a quasi-linear time algorithm to compute the GKR distance on tree metrics, which can be applied to large-scale problems.

__ Strengths__: The GRK distance is interesting, and the proposed quasi-linear time algorithm for solving GRK distance is also fast and effective. The theoretical analysis of the proposed method seems to be sufficiently rigorous and provides some insights.

__ Weaknesses__: (1) This paper proposes a general framework for unbalanced OT problems named GRK distance, one problem is how generality is reflected in the equation of the GRK distance.
(2) A question is that if the proposed algorithm for GKR distance is also applicable to optimal partial transport and Kantrovich-Rubinstein distance. If not, what is the main reason why the proposed algorithm cannot be directly applied to; if it can be applied directly, what is the main advantage of GRK compared with optimal partial transport and Kantrovich-Rubinstein distance?
(3) This paper does not explain why GKR distance is robust.
(4) The reported evaluation results may not be sufficient to support the claim on the effectiveness of the GRK distance. It seems that there are no quantitative results in comparison with other distances and methods on some real-world tasks. There are many other datasets available on which the method can be further tested. It is lacking comparisons with some more recent methods during the past 1~2 years.

__ Correctness__: The claims, method, and empirical methodology appear to be correct, but I didnâ€™t check completely.

__ Clarity__: The paper is pretty readable.

__ Relation to Prior Work__: The relation between the GRK distance and some previous contributions is clearly discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: (1) The formulas proj_1 \mu (A) = \mu (A \times X) and proj_2 \mu (B) = \mu (X \times B) on line 106 may or should be proj_1 \pi (A \times X) = \mu (A) and proj_2 \pi (X \times B) = \mu (B).
(2) The formula (y,x) on line 112 may or should be written as (y,x) \in E.
-------------------------------
After rebuttal:
Thanks for the response. The rebuttal addressed part of my concerns, so I keep my rating.

__ Summary and Contributions__: The paper proposes a new formula to formalize the unbalanced OT problem, and many existing formulas for this problem can be treated as special cases of it. Further, the authors propose a tree-based algorithm that solves the unbalanced OT problem in quasi-linear time for large scale dataset.

__ Strengths__: (1). The new formula for the unbalanced OT problem is powerful enough to cover a lot of other representations.
(2). The performance of the algorithm seems to be promising in terms of the computation speed.

__ Weaknesses__: (1). Basically, the paper is hard to follow, especially for the algorithm part (Sec. 5). Maybe the authors can use some figures and graphs to better illustrate their idea. In my opinion, it seems that the algorithm tries to recursively transport mass from the nodes of the source distribution to the nearby nodes in the target distribution. But the writing here makes me hard to grasp the basic ideas.
(2) I am not sure if the algorithm can only solve the 2-dimensional unbalanced OT problem because the use of the quadtree structure. How to build the tree structure on high dimensional space?
(3) If there is no intersection between the supports of the source and the target distributions, is the algorithm still feasible?
(4) About the accuracy of the proposed algorithm: since both \lambda_d and \lambda_c are set to be constant, the GKR distance can be treated as solving a linear programming problem. Maybe the authors can experiment on a small toy dataset, then show the groundtruth GKR distance and the approximated distance by the proposed algorithm.

__ Correctness__: (1) Proof of Thm.2: Between Line 505-506, it seems that there is a hidden assumption that c(x,y) >= 2\lambda. Why should this assumption work?
(2) Is the OT_{tree} defined in Line 522 the same with |v(P)-v(Q)|_1 in [31]?

__ Clarity__: (1) The illustration of the algorithm seems not clearly enough. Maybe the authors can use some figures and graphs to better illustrate their idea.
(2) Some minor questions:
a. What's the meaning of \omega in Line 151?
b. Line 184: is there any reference for the computation complexity of the sinkhorn algorithm?
c. Line 283: Figure 6 seems to be Figure 3?

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: --------------------
After Rebuttal: I decide to change my score from 5 to 6, since the rebuttal solves some of my questions. The new illustration of the algorithms and experiments is convincing. But I am still skeptical about the performance of the algorithm, beyond the experiments, maybe some theoretical bound of the accuracy should also be given.