Review for NeurIPS paper: Duality-Induced Regularizer for Tensor Factorization Based Knowledge Graph Completion

NeurIPS 2020

Duality-Induced Regularizer for Tensor Factorization Based Knowledge Graph Completion

Review 1

Summary and Contributions: In this work, the authors propose a new regularizer for tensor factorization-based knowledge graph completion. The new regularizer is motivated by the equivalence (duality) between the new loss function and a Frobenius norm-based distance that needs to be minimized. In this sense, the paper provides very nice motivation and justification for the new method which outperforms other tensor factorization-based approaches as the experimental results demonstrated. The Authors have answered satisfactorily to all my questions and concerns even providing new comparison results. I thank the Authors for such detailed responses.

Strengths: - I found the results significant and relevant to the NeurIPS community - Very well written. The problem is nicely introduced, and the new method is well motivated through the duality of the loss and a Frobenius norm-base distance.

Weaknesses: - It is still not clear how far are the obtained results compared to state of the art distance-based methods. All experiments are performed only with tensor-based methods using different regularizers. - Some improvements can be incorporated (see detailed comments below)

Correctness: Yes, however it could be improved (see detailed comments below)

Clarity: Yes, the problem is nicely introduced and the new solution is well justified.

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: - Line 70: DB methods are based on Minkowski distance, however, in this paper the duality is stablished only for the case of Frobenius norm, i.e. Minkowski distance with p=2. It would be nice that authors provide a deeper explanation about the role on parameter p in DB methods. What is the optimal value of p in state of the arts methods? - Line 139: I would suggest changing “When we train a distance based model” to “When we train an Euclidean distance based model” - Line 180: I think there is an error. The sentence: “the regularizer 5 and 6” should be changed to “the regularizer 4 and 5” (check equation numbering) - Line 180: As a regularizer having several terms, it would be convenient to consider different regularizer coefficients as hyperparameters. In fact, in supplemental material (lines 36-37) the cost function has 3 hyperparameters: lambda, lambda_1 and lambda_2. Please, clarify this in the main document and provide some guidance on how to chose hyperparameters. - In Section 4.4, it is demonstrated that when relation matrices are diagonal, the proposed new regularizer is equivalent to the nuclear-2 norm. What is the effect of having non-diagonal relation matrices? Is it still favoring minimum nuclear-2 norm solutions? I think it would be useful to analyze it at least numerically. - In Experiments, why there is no comparison against a distance-based method? I wondering if methods considered in the experiments can be considered as state of the art results. - Please, provide definition of used performance measures (MRR and H@k)

Review 2

Summary and Contributions: The paper presents a novel regularization approach for tensor factorization based knowledge graph completion (KGC) models. There are two key contributions presented in this work: 1. The paper established a duality relationship between the tensor factorization-based KGC model (primal) and the distance-based KGC model (dual). 2. Based on duality together with tensor factorization, the paper presents a DUality-induced RegulArizer (DURA), which can be widely applicable to tensor factorization based knowledge graph completion models.

Strengths: 1. Nice problem, and elegant approach. 2. Mostly thorough evaluation. 3. Algorithm seems to perform well.

Weaknesses: 1. Writing was confusing in some places; further explanation was needed. 2. More recent comparison methods for knowledge graph completion could be included.

Correctness: Technically sound and correct.

Clarity: Generally well written with some minor blemishes.

Relation to Prior Work: The previous work is appropriately cited and discussed.

Reproducibility: Yes

Additional Feedback: 1. The writing was a bit confusing at times. In particular, there should be a bit more explanation of why the number of entities in the WN18RR dataset is smaller than that reported in the original paper [22]; in page 10, line 193, it is unclear what is being said here of A(i, j), but A seems never used in the paper. 2. There should be more recent KGC algorithms included in the evaluation. The experiments were mostly focused on self-assessment to compare - with or without DURA, or other simple regularizers. There have been dozens of very good KGC algorithms presented recently. In particular, the propsoed method focused on two-way coupling effects, while the paper in [1] directly used a Tucker-based KGC model to explore three-way interactions, it would be better to make a comparison of them. [1] I. Balazevic, C. Allen, and T. M. Hospedales, “TuckER: Tensor factorization for knowledge graph completion,” in EMNLPIJCNLP, 2019, pp. 5185–5194. Minor: In page 8, line 180, "the regularizer 5 and 6" should be "the regularizers (4) and (5)". I think the authors provided an articulated answer to our reviews. The new experimental results are convincing and the analysis of these results is insightful and helpful for other researchers.

Review 3

Summary and Contributions: In this paper, authors propose duality-induced regulariser (DURA), a way of regularising tensor factorisation-based neural link predictors. The underlying idea is that, if multiple entities appear as objects in a (s, p, ?) triple, they should be close in embedding space, and authors enforce this behaviour by regularising the interactions between the latent factors.

Strengths: Interesting ideas and clear write-up, although I think it is imprecise in some places - see my comments in the "Correctness" section.

Weaknesses: Results do not seem significantly better than these reported in e.g. https://arxiv.org/abs/1806.07297 (https://github.com/facebookresearch/kbc/)

Correctness: One issue I have with this paper is that it broadly refers tensor factorisation-based models while it actually talks about ComplEx, e.g. see lines 130-131. The derivation of DURA is not totally clear to me: if you want your model to behave like in the Eq. on line 134, why just not use that scoring function instead? Authors state, on lines 160-161: "Therefore, the performance of TFB models for KBC is usually unsatisfying", but it doesn't look like that to me, since their methods is on par with simpler TFB models such as https://arxiv.org/abs/1806.07297 (see the result tables in https://github.com/facebookresearch/kbc/) Lines 254-255: "Note that [..] RESCAL, we only compare DURA tot he squared Frobenius norm regularisation as N3 does not apply to it." - why? You can use the N3 norm on RESCAL's relation embedding matrix as well.

Clarity: Apart from some imprecisions, the paper is very readable

Relation to Prior Work: There is plenty of work on regularising factorisation models and KG embeddings, this could probably be expanded quite a bit. For instance, in https://arxiv.org/abs/1707.07596 authors use background knowledge for regularising knowledge graph embedding models.

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: This paper proposes a regularization scheme for tensor-decomposition based KG completion methods. The regularization scheme is well-motivated, albeit very similar to L2 regularization, and the paper gives good intuitions for why the regularization might be helpful

Strengths: Here's some of the things I liked about this paper: 1. I liked section 4.2 that pictorially shows how DURA can encourage embeddings that occur in similar (subject, relation) contexts to be close together. 2. The analysis that shows that DURA regularized embeddings are more robust to sparsity

Weaknesses: I think a lot of comparisons / qualitative analysis is missing: 1. I would like to see a T-SNE plot that shows that the claim that entities with similar (e1,r) contexts are indeed being assigned similar representations is true. 2. There have been a few new KG completion models in the past few years such as KBAT / GAATs that achieve competitive performance on WN18RR/FB15k-237. Would DURA help on these methods as well? I think it's ok if the answer is no, but it would make the paper stronger if a discussion on limitations is added.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Some of the recent work on KG completion has been omitted. Some examples: 1.Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs. Nathani et al. ACL 2019 2. Quaternion Knowledge Graph Embeddings. Zhang et al. NeurIPS 2019

Reproducibility: Yes

Additional Feedback: