Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The work introduces a beautiful formalism and a host of reasoning-related operations. For instance, they introduce the equivalent of basic logical operations (AND, OR, negation) in their formalism. Additionally, they formalize higher-level operations based on these atomic relations. For instance, enforcement of type-hierarchy in the representation or membership queries. My biggest concern is the experimental evaluations. The first experiment (link prediction) is a must here, but certainly not enough. The other experiments on the LUBM dataset, which is claimed to evaluate “reasoning capabilities” is a bit vague. In particular, when you say “we custom designed eight new queries that comprise higher level predicates,” I would like to understand it better how you did this. Interpreting the results in Table 1 is a bit hard: - When looking at HITS@1, E2R is way better than others in FB15K, but not much difference between E2R and CE in WN18 (in fact, CE is better); why? - Why in WN18, the mean rank of E2R is vastly bigger than other systems? Overall, it’s a nicely written work and I enjoyed reading work. I would have expected more careful experimentation on the claim, especially in more natural settings. Writing: - “aims to bridges”: to bridge - Figure 1, the label of the right figure should be “T-Box”? - In Table 1, the middle row you never boldface any of the systems; not sure if it was intentional (also the top-left row.) - “embed a DL based symbolic KB into Hilbert space”: we’re really mapping KB to finite vectors in R^d (which is a subset of Holbert space, but it does not cover the whole Hilbert space. Might be helpful to be clear about this, so that it doesn’t come off as an over-claim. - There are some terms borrowed from abstract algebra literature which could be clarified with an extra bit of explanation/pointers: “field”, C (space of complex numbers?), “isomorphic”. - There are other works that try to induce logical relations between the concepts/relations when embedding; for instance: * Low-Dimensional Embeddings of Logic, 2014 * Injecting Logical Background Knowledge into Embeddings for Relation Extraction, 2015 * Lifted Rule Injection for Relation Embeddings, 2016
The authors introduce a new way of embedding entities and relations into a R^n vector space via quantum logic. Quantum Logic provides the main framework by which they can show that, given an embedding that fits all the requirements, queries become a matter of taking the distance to a plane. Queries here are in the sense of description logic: set membership, containment, etc. The embedding is found through SGD with a loss function that represents all of the constraints the embedding needs to satisfy. The authors did an excellent job of providing the necessary background and walking through the construction and to the experiments. There were a few times I jotted down a question to ask, and then found it answered in the next paragraph. The writing is clear, though if there was space it would be nice to expand a bit to help the reader follow the equations 4--9. One question I didn't follow is how simple the operations in section 3 are. Is it trivial to intersect two subspaces, add two subspaces, and find the orthogonal complement of a subspace? Are the subspaces each represented with just a single 2d-dimensional vector, or is it more than that? Is it a span of a set of vectors, and these operations are trivial operations on those sets? Just looking for more understanding of the cost of these operations and the cost of representation in the space. One question I had is how deep of an ontology of relations can be represented with n-dimensional space. Is it n, or something larger? Can trinary or higher-order predicates be represented trivially by extending the space to R^3n for example, or does the entire technique support only unary and binary predicates? The description of the experiments makes it sound like the baseline system hyperparameters were tuned on the test set. Can you verify if that is true or if it was done more carefully with something like cross-validation or a separate development set? There seems to be a typo in Table 1 where the HITS@1 and HITS@10 results are the same for E2R. I'm hoping they are the results for HITS@1, but more likely are the actual results for HITS@10. Line 289-290 made me remember that there are other reasons for doing an embedding of the KB other than as an approximation to doing logic reasoning. For logic reasoning, using a symbolic reasoner will perform better (as stated). However, there are other benefits of an embedding. For instance, it may be useful for prediction (as you show with the link prediction task?). Just to confirm: A symbolic logic reasoner would perform poorly on the two link prediction tasks, right? Because the link prediction is based on some unknown fuzzy properties of the entities and relations they belong to, not simply a reasoning task, is that correct? Another benefit, (not evaluated here), is that usually one can say something about relation and entity similarity based on closeness of their vectors. A qualitative evaluation of this property may also lend value to the paper. I think a brief discussion of the pros and cons of symbolic reasoning vs. embedded would be interesting, informative, and also strengthen the paper to point out the benefits of what you've done. small typo: line 277 "predication" should be "prediction". ** Update after author response: Thank you for answering my questions in your author response.
The paper presents an interesting idea to learn embeddings of hierarchies and ontologies based on quantum logic. I find the proposed formulation to be novel and a positive contribution. However, I do not think that the experiments fully bear out the utility of the proposed approach. The loss function (10), if solved exactly, would capture the logical structure of hierarchies. However, because it is a non-convex problem, the paper solves it via stochastic gradient descent. Hence, there is no assurance that an optimal solution is always found, and thus the proposed approach suffers from the same problem (perhaps to a lesser degree) as other embedding approaches, i.e., "no guarantee that embeddings maintain the sanctity of the logical structure" (line 35-36). Because of this, I think it behooves the paper to compare itself against other embedding approaches that attempt to incorporate hierarchical structure (in addition to non-hierarchical approaches like TransE and ComplEx that were used as baselines in the paper). The citations on line 326 (i.e., [32, 33, 34, 35, 10]) provide a list of possible hierarchical embeddings approaches that could be compared against. Two additional ones are: (a) Poincaré Embeddings for Learning Hierarchical Representations. Maximilian Nickel, Douwe Kiela. NIPS, 2017. (b) Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. Maximilian Nickel, Douwe Kiela. ICML, 2018. I think it would have been instructive if the paper has compared against these "hierarchy-aware" approaches, because the comparisons would show how much better the paper's *approximate* quantum-logical constraints are versus other ways of embedding hierarchies. I think a running illustrative example of the components of equation 10 on page 4 and 5 would greatly improve the comprehensibility of the paper. Questions: 1. How does the paper's approach compare against (a) and (b) above? 2. Why did the paper not compare against the "hierarchy-aware" approaches mentioned above ([32, 33, 34, 35, 10])? 3. Page 6, line 222-227: By sampling invalid entities, isn't the paper implicitly also making the closed-world assumption, a shortcoming that the paper critiques on line 86-87? 4. Page 6, line 233: How does the paper "[find] all those entities"? By enumeration? 5. Page 6, line 242: How does confidence/probability correspond to projection length? There is no information about this in the paper or the supplementary material. 6. Page 7, Table 1: What exactly constitutes the "reasoning task" on LUBM? What must be inferred? How many steps of reasoning are required? How big is the training and test data? As it stands, this experiment is hard to replicate. 7. Page 7, Table 1: Why is the paper's approach not performing well on WN18? 8. Page 7, line 266, "better convergence": Why would you get better convergence with 3 negative entities per positive entity? What's a negative entity? 9. Page 8, line 291-293, "projecting each entity ... non-fitment score": Could the author elaborate on what this sentence means? 10. Page 9, line 315, "their black box nature": The paper's embedding is also a black-box, no? What interpretability does its embedding offer? Nits: * Page 4 line 134: implies following -> implies the following (same mistake present elsewhere in paper) * Page 5 line 189: sufficiency -> sufficient * Page 8 line 310: bridges -> bridge UPDATE: The authors' feedback addressed my concerns to a large extent (though I still feel more details won't hurt the reproducibility of the LUBM experiments). Hence, I've upgraded my overall score to "6: Marginally above the acceptance threshold".