Reviews: Self-attention with Functional Time Representation Learning

Originality: The application of self-attention in continuous-time event sequences is an interesting approach. The authors clearly note the shortcoming of self-attention when applied to such problems. They propose translation-invariant time kernel functions justified by classic function analysis theories and implement 4 new time embeddings that can be optimized by backpropagation and are compatible with self-attention. I believe the proposed time embeddings are novel and generalizable to other temporal tasks. Quality: Motivation from classic functional analysis theory [12] and [14]and developing differentiable time embeddings is the key contribution of this paper. The derivation of these embeddings from the original theorems is very clear and impressive and is the most interesting contribution of this paper. However, the lack of evidence showing the advantage of these embeddings in the results (as I explain in significance) lowers the quality of overall paper. Clarity: The paper is well-written clearly stating the shortcomings of the previous works and how the proposed approach takes into account those shortcomings. There are few typos which can be corrected e.g line 244 (supplemnetary meterial). It’s not clear in Figure 1 what the items in legend represents and therefore its hard to interpret the results in Figure 1. Significance: Although the proposed approach is novel, the results do not convince me about its significance. For instance, none of the 4 proposed time embeddings (Table 1) significantly outperform the position encoding in all 3 datasets. The proposed Bochner Normal and Bochner Inv CDF either perform similarly or worse than position encodings. While Mercer shows the best results in 2 datasets it fails to outperform in movielens dataset. Hence, the advantage over position encoding doesn’t seem evident from the results.

Originality: Positional/temporal embeddings with self-attention models have been studied in many papers, although not with theoretical-driven motivations. The paper should distinguish the contributions more clearly. Quality: The quality of theory-driven discussions is high but the experimental results section needs to be improved. Significance: The paper addresses an important direction to enable self-attention models for time-series forecasting. In general, the study of temporal embedding selection is impactful, especially with the insights from two theorems. Clarity: The paper is mostly well-written and the contributions are clearly explained. But I think there is room for improvement in writing. For example, some theoretical arguments can be moved to the supplementary material, in exchange for interesting interpretations and analysis moved to the main body of the paper.

Paper ID:	9358
Title:	Self-attention with Functional Time Representation Learning

Reviewer 1

Reviewer 2

Reviewer 3