Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Originality: This paper has a number of original ideas. It clearly points out that many recent forecasting papers with neural networks train on many series but focus forecasting on individual series. Instead, their novel contribution is to blend these two approaches using an attention mechanism. This attention mechanism simply determines if the forecaster should weight the global forecasting from temporal matrix factorization or the local forecasts trained on individual series. The entire thing is trained jointly so the local forecasts are mostly trained on sections where the local forecasts are important etc. The authors also introduce a method for dealing with the scale problem in forecasting by separately forecasting the mean and residuals. One question I had, which would be nice to see addressed, is why the attention mechanism is only based on the Y series, rather than both the Y series and the latent factors X + F. I could imagine that there could be other info included in X + F about how good global forecasts would be for a specific series. Clarity: good, I found it very easy to follow and understand. Significance: High. This is important. Many neural network architectures are build for text/audio/images etc but these cannot simply be ported over to high dimensional time series forecasting. The area needs its own original contributions, and this paper does just that. Quality: High. Experiments are well done, the methodology is well explained and well motivated.
1. The originality of this paper is not much. The two models proposed in this paper are the deep level network (DLN) and the hybrid model DeepGLO. The DLN is just a combination of two temporal convolution networks. One network models the rolling mean of a time series and the other one models the residual part. The DeepGLO simply combines a global model and a local model by a linear combination. Both of the ideas are not very original. 2. The authors emphasize the difficulty of normalizing data and show that they are dealing with this problem. However, the proposed model does not completely solve the problem. First, the proposed DLN only separates the mean and residual of time series, but the scale of the mean and the residual can still be large and is still a problem for neural networks. Therefore, the proposed model does not have the ability to solve the problem. Second, as shown in Table 2, the experimental results on the unnormalized data are not significantly better than those on the normalized data. Consequently, the authors are suggested to reconsider whether they should emphasize that they have solved the problem of normalization. 3. The organization of Section 4 and 5 is hard to follow. The reason could be that it is written in a bottom-up manner. The paper first introduces the local model, the global model and then the hybrid of the two. The problem is that some symbols in Algorithms 2 is not yet introduced when readers are reading that part, which may confuse the readers.The authors are suggested to move Section 5.2 forward. That is, introduce the whole model before explaining each part of the model. This will make the methodology clearer. 4. The sizes of the figures, tables and the font in the captions are too small, and they are difficult to read (Although this does not affect the current evaluation of this paper). It would be better to make the figures larger and just use the normal font size for captions and tables.
This paper proposed a hybrid model, which tries to tackle the challenges of multi-dimensional time series forecasting (TSF) from two perspectives: (1) to grasp the 'global' evolutionary laws of time series datasets that have a wide variation in scales of the individual time series; (2) to extract the 'local' patterns through a data-dependent attention model. The motivations are clear. The experimental results on 4 real-life tasks validate their superiority in comparison with other competing TSF methods. However, there is still a lack of necessary validation and explanations, especially with regard to the specific roles of the global and local models in TSF. Is it possible to provide some more detailed theoretical proofs or design a more detailed experiment on a dataset to demonstrate the power of global and local models? Otherwise, the contributions are weak and unclear. Besides, It would be better to extensively compare the empirical performance of DeepGLO with other existing hybrid models, such as Yu et al. Spatio-temporal Graph Convolutional Networks: A deep learning framework to traffic forecasting, 2018.