__ Summary and Contributions__: The authors propose a large margin learning algorithm in hyperbolic space. They start with a perceptron-style algorithm and prove its convergence, and then extend it to a large-margin version using adversarial training. Empirical evaluations show the utility of the algorithm, especially on tree-like data which cannot be faithfully embedded into Euclidean spaces.
The authors have addressed the questions raised by the reviewers sufficiently and I am keeping my score of acceptance.

__ Strengths__: - The proposed algorithm is more principled with convergence proofs compared to previous work like [6]. Both the perceptron case and the large margin case are covered.
- The main ideas are simple and clearly presented, with solid theoretical analysis.
- Hyperbolic space embeddings is a very promising approach for tree-structured or network data, so directly having a learning algorithm that works in this space is an important contribution.

__ Weaknesses__: - The experiments are more like a proof-of-concept for the algorithm. They can be strengthened by testing on more datasets to provide better evaluations on the utility of the algorithm.
- Figure 3 is not completely clear. It looks like only training losses are shown (3a) and not the test loss. This is confusing. Also, why do the \alpha-robust loss stay relatively stable around iteration 200 in 3b while the margin keeps increasing in 3c?
- Given a dataset, it is not clear how to choose the robustness parameter \alpha or how to estimate bounds for it for a search.

__ Correctness__: The claims and empirical methodology are correct to the best of my knowledge.

__ Clarity__: Yes, the paper is on the whole quite clear. It would be nice to have more background discussions on hyperbolic spaces and their use in machine learning, but not crucial due to space limit.

__ Relation to Prior Work__: There are sufficient discussions on related works and how the current work differs.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: Learning a large-margin classifier in hyperbolic rather than Euclidean space was first studied in the ICML 2019 paper "Hyunghoon Cho et al., Large-margin classification in hyperbolic space". Learning representations in hyperbolic spaces is especially beneficial for hierarchical data, requiring significantly fewer dimensions than standard Euclidean spaces.
This paper presents the first theoretical guarantees for this kind of work, studying robust large-margin learning, adversarial learning, and dimension-distortion trade-offs in hyperbolic spaces.
The theoretical derivation is sound, and the results are important. I appreciate this kind of work! But it may be better to provide some guidance on the real application of the theory, e.g., for what kinds of machine learning problem and data sets, the proposed theory has advantages over traditional theories?
For the empirical evaluation, the authors only use one data set, which seems not convincing enough.
I agree to accept the paper, but I will not change my original score. The reason is that I think more experiments are needed to verify the practical usefulness.

__ Strengths__: The theoretical derivation is sound, and the results are important.

__ Weaknesses__: For the empirical evaluation, the authors only use one data set, which seems not convincing enough.
It may be better to provide some guidance on the real application of the theory, e.g., for what kinds of machine learning problem and data sets, the proposed theory has advantages over traditional theories?

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: I agree to accept the paper, but I will not change my original score. The reason is that I think more experiments are needed to verify the practical usefulness.

__ Summary and Contributions__: The paper explores large margin linear classifier in the hyperbolic space from a theoretical point of view. The first contribution is an adaptation of the classical perceptron algorithm in the hyperbolic space quite similar in the definition to [6] but the authors focus on the analysis of the convergence (which is new). This analysis allows to state the difficulty to learn a large margin classifier in hyperbolic space wrt to the exponential number of iteration required.
The second (and main) contribution focuses on this problem, how to learn a large margin classifier. The proposed solution is to consider a loss based on a hyperbolic logistic loss considering adversarial examples. The examples are generated by perturbing available data in order to maximize the logistic loss and such that they are still close to the original data in a distance lower than a given threshold. The proposed approach combining adversial examples and gradient descent allows to reduce to a polynomial time the number of iterations to learn a maximum margin classifier.
Finally the authors propose an analysis of the effect of the distortion of the euclidean and hyperbolic embedding on the margin. This important discussion allows to state the superiority of hyperbolic embedding over euclidean when the data is suitable (hierarchical data).

__ Strengths__: The paper is well organized, easy to read. It shows theoretically the benefits of hyperbolic classifiers for hierarchical data. The theoretical results are important and are a clear improvement over the current SOA in hyperbolic classification.
*********** After feedback : Thanks to the authors for the clarification, the additional page seems a good idea.

__ Weaknesses__: The experimental section analyses only from a theoretical point of view the results. The authors clearly state that their focus "is to understand the benefits of hyperbolic spaces" but it would be interesting to provide not only training loss and margin results but also usual test errors.

__ Correctness__: To the best of my competence, yes (the supplementary material is very long and I did not have the time to review all the provided details).

__ Clarity__: The paper is very clear, the supplementary material is quite complete (too long maybe ?) but helps to have a deeper comprehension.

__ Relation to Prior Work__: The proposed method is a clear improvement of [7], the SOA is clearly stated and discussed. Minor comment : one recent reference is not discuss in the paper, the Hyperbolic Neural Network (Ganea et al.) which also provide in an other context theoretical results on hyperbolic machine learning.

__ Reproducibility__: Yes

__ Additional Feedback__: Minor comment : in Fig 3, concerning alpha-robust loss, the best result is not achieved for alpha=1 but alpha=0.5. Do you have any idea why ?

__ Summary and Contributions__: The paper provides theoretical guarantees for large-margin classifiers in hyperbolic spaces. Firstly, they show that learning a large margin
classifier using a "standard" gradient-based algorithm over a margin loss may require an exponential number of iterations. To cope with this, the authors propose
a method based on adversarial examples. In particular, the proposed approach enriches the training set with new examples that are perturbations of the input.
A (bounded) perturbation x' of example x is chosen (via an optimization) in such a way of leading the classifier h to differently labeled the examples h(x') != h(x). Authors show that such an approach converges to a large margin classifier in polynomial time. Some experimental analyses have been conducted to support theoretical findings.

__ Strengths__: - The paper is well written and clearly states the motivation and the impact of the findings;
- The theoretical analysis seems sound;
- The theoretical foundings are numerous and relevant to the community.

__ Weaknesses__: - Empirical analyses are rather limited, e.g., only a dataset;
- Sometimes the used notation is awkward, e.g., 2nd equation in 4.1;
- Some sketch of the proofs would have been welcomed.

__ Correctness__: The claims and methods seem correct. The empirical analysis is correct but barely sufficient.

__ Clarity__: The paper is well structured and well written. The presentation is cured even though sometimes the notation is a bit awkward.

__ Relation to Prior Work__: The authors clearly state that to the best of their knowledge they are the first to study adversarial learning in hyperbolic spaces.
Authors also underline the differences w.r.t. [6], that is, a closely related work on support vector machine for hyperbolic spaces.

__ Reproducibility__: Yes

__ Additional Feedback__: Comments:
- In the paper, authors often compare the Euclidean margin with the Hyperbolic one. However, it is not clear whether \gamma refers to the very same quantity in
both spaces, or it is an 'overload' of the same notation. For example, this makes it hard to assess the impact of Remark 3.2.
- Equation (4.2) is a bit rushed. The intuition is clear, but it is not clear from the text why optimizing this problem is equivalent to find the correct adversarial example.
- Many of the proofs of the findings (almost all of them) are detailed described in the supplementary material. However, it would be highly appreciated to put in the paper at least a sketchy proof or a couple of lines providing an intuition.
- In Section 5, the authors introduce the concept of embedding of hierarchical structures, referring the reader to other results about the embeddability of trees. However, since it represents one of the motivations for the use of hyperbolic spaces (HS), I would have appreciated some more details about the tree embedding in HS.
- To add value to the paper, I would suggest including more data sets (e.g., the ones used in [6]) to the empirical evaluation.
I have read the author feedback and I am partially satisfied by the response. I think that further experiments would be needed to demonstrate the practical usefulness of the proposed approach.